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1  Executive  Summary 

In  this  report,  we  describe  our  activities  related  to  the  private  infonnation  retrieval  (PIR)  project. 
Our  approach  is  based  on  the  efficient  implementations  of  the  keyword-oblivious  transfer 
cryptographic  primitive,  which  allows  a  client  and  server  to  negotiate  an  exchange  of  data  based 
on  a  keyword  not  learned  by  the  server.  Although  no  protocols  exist  that  allow  this  primitive  to 
scale  to  the  magnitude  needed  by  PIR,  we  utilize  a  semi-trusted  third  party,  known  as  the  isolated 
box,  to  meet  the  stated  requirements.  We  implemented  our  approach  in  a  realistic  prototype,  and 
evaluated  its  performance  over  a  large  (60  gigabyte)  database  and  a  set  of  queries  provided  by 
the  MIT-LL  test  team.  We  found  that  our  approach  meets  and  exceeds  the  given  perfonnance 
requirements,  with  the  majority  of  the  performance  penalty  over  plain  MySQL  processing 
coming  from  encryption  and  decryption,  rather  than  keyword-oblivious  transfer. 

2  Introduction 

The  goals  of  the  Automatic  Privacy  Protection  program  are  to  “ develop  and  demonstrate 
practical,  sound  automated  methods  for  the  use  of  private  information  retrieval  techniques  in 
Intelligence  Community  systems,  to  automatically  protect  the  private  data  of  untargeted 
individuals,  to  assure  the  mandated  policies  are  enforced,  and  to  enable  more  effective 
interagency  and  intergovernmental  data  sharing  for  improved  security .”  To  this  end,  we  have 
pursued  the  development  of  efficient  protocols  for  the  Private  Infonnation  Retrieval  (PIR) 
problem:  a  client  queries  a  large-scale  database  on  a  potentially  adversarial  server,  and  learns  the 
correct  answer  to  his  query  without  leaking  any  infonnation  about  it  to  the  server.  We  have 
demonstrated  fonnally  that  our  PIR  protocol  meets  the  stated  privacy  needs  of  IARPA,  and 
produced  a  working  prototype.  A  team  of  independent  testers  from  MIT-Lincoln  Labs  has 
verified  that  our  prototype  is  functional  and  bug-free  on  a  large  test  corpus,  and  that  is  exceeds 
IARPA's  minimum  performance  requirements  by  more  than  an  order  of  magnitude.  Our 
prototype  is  even  more  efficient  since  the  MIT-LL  test  was  conducted. 

3  Technical  Approach 

As  per  the  rules  of  engagement  (ROE),  our  system  has  three  primary  components:  a  client,  a 
server,  and  an  isolated  box. 

•  The  server  holds  a  plaintext  database,  consisting  of  an  arbitrary  number  of  rows 
organized  according  to  a  single  schema.  It  communicates  with  the  client  and  isolated  box 
to  provide  responses  to  queries  in  an  oblivious  fashion. 

Assumptions:  It  is  assumed  that  the  server  is  honest-but-curious;  it  follows  the  protocol, 
and  does  not  collude  with  the  isolated  box,  but  may  attempt  to  learn  more  about  the 
contents  of  the  client’s  queries  by  running  additional  algorithms  over  its  view  (messages 
exchanged  during  the  protocol). 

Guarantees:  The  server  learns  no  information  from  processing  a  query.  The  keyword 
oblivious  transfer  (KOT)  protocol  used  to  match  a  query  guarantees  that  the  server  does 
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not  learn  either  the  field  over  which  the  query  is  performed,  or  the  field  value  targeted  by 
the  query.  Furthermore,  after  completing  the  protocol,  the  server  has  provided  the  client 
with  the  infonnation  needed  to  retrieve,  in  clear  text,  exactly  the  rows  of  the  database 
corresponding  to  the  query. 

•  The  client  issues  queries  to  the  server,  which  take  the  fonn  of  a  single  attribute -value 
pair  for  each  query.  The  attribute  is  an  element  of  the  schema,  and  the  value  is  to  be 
matched  by  each  row  in  the  query’s  result  set. 

Assumptions:  It  is  assumed  that  the  client  is  honest-but-curious,  and  does  not  adaptively 
select  queries  to  learn  more  about  the  database  than  intended  by  the  policy. 

Guarantees:  Upon  completing  a  query,  the  client  learns  the  following  infonnation  about 
the  database:  the  number  of  rows  matching  its  query,  the  plaintext  of  each  matching  row, 
and  the  size  of  the  database.  It  does  not  learn  the  plaintext  of  any  rows  not  matching  its 
query,  and  there  are  no  rows  in  the  database  that  match  its  query  for  which  it  does  not 
leam  the  plaintext. 

•  The  isolated  box  maintains  an  encrypted,  permuted  version  of  the  server’s  database. 
Intuitively,  the  isolated  box  serves  as  an  oblivious  storage  point  that  allows  our  protocol 
to  optimize  the  amount  of  network  traffic  transferred  in  the  course  of  serving  a  query. 
Assumptions:  It  is  assumed  that  the  isolated  box  is  honest-but-curious.  It  does  not 
collude  with  the  client  to  leam  more  about  the  server’s  database,  and  it  does  not  collude 
with  the  server  to  leam  more  about  the  client’s  query. 

Guarantees:  The  isolated  box  can  leam  the  approximate  frequency  with  which  an 
individual  encrypted,  permuted  record  of  the  database  is  accessed.  Note  that  the  isolated 
box  does  not  learn  the  contents  of  the  record  and  the  frequency-of-access  infonnation  is 
not  perfect  due  to  randomness  introduced  by  the  client. 

3.1  Definitions 

A  database  D  is  a  set  of  records  indexed  by  t  attributes  Ai,  .  .  .  ,At,  where  we  identify  an 
attribute  Ai  with  the  set  of  attribute  values  it  may  take.  Each  record  r  of  the  database  takes  the 
form  r  =  (xi,  .  .  .  ,  xt,  y)  with  each  Xi  in  A  denoting  an  attribute  value,  and  y  in  {0,  l}1  (for  some 
length  parameter  I)  being  the  payload.  We  assume  a  database  contains  no  duplicate  records. 
Queries  are  of  the  fonn  (i,  x)  where  1  <  i  <  t  and  x  in  Ai.  The  query  q  =  (i,  x)  represents  a 
request  for  all  records  whose  value  in  the  ith  attribute  is  x.  (For  such  a  query,  we  call  i  the 
relevant  attribute  and  x  the  keyword.)  Fonnally,  for  the  query  q  =  (i,  x)  on  a  database  D  we 
define  q(D)  =  {(xi,  .  .  .  ,  Xt,  y)  in  D  |  Xi  =  x}  as  the  set  of  records  that  match  the  query. 

3.2  Primitives 

We  assume  a  semantically-secure  encryption  scheme  E  =  (E,  U)  defined  over  (K,  M,  C),  where 
K  is  the  keyspace,  M  is  the  space  of  plaintexts,  and  C  is  the  space  of  ciphertexts.  Additionally, 
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we  assume  a  cryptographically-secure  hash  function,  and  previously-established  RSA  credentials 
(N,  e,  d)  for  the  server  (N  is  the  modulus,  e  is  the  public  exponent,  and  d  is  the  private  key). 

Finally,  we  assume  a  keyword  oblivious  transfer  (KOT)  scheme  OTkl . km,  the  security  of  which  is 

based  on  the  intractability  of  the  one-more-RSA-inversion  problem. 

3.3  PIR  Protocol 

The  protocol  works  in  three  stages.  In  the  preprocessing  stage,  S  initializes  IB  with  infonnation 

from  D.  In  the  second  (query)  stage,  C,  S,  and  IB  communicate  to  serve  a  query  from  C.  The 
preprocessing  stage  is  perfonned  once  before  any  queries  are  served,  and  periodically  when 
needed  to  ensure  the  privacy  of  D  according  to  refresh  parameter  r.  The  second  is  performed 
each  time  C  has  a  new  query.  In  the  third  (refresh)  stage,  a  decision  is  made  as  to  whether  the 
preprocessing  stage  is  re-executed  to  gain  additional  privacy  for  D. 

•  Preprocessing  stage: 

1 .  S  does  the  following  once,  before  any  queries  are  served: 

-  Picks  n  random  keys  ki,  .  .  .  ,  kn  from  K. 

-  Picks  a  random  permutation  s  of  n  elements. 

-Prepares  n  ciphertexts  ci,  .  .  .  ,  Cn,  where  Ci  =  Eki(Ri). 

-  Sends  the  list  c_i,  .  .  .  ,  c_n  to  the  isolated  box  IB. 

-  Initializes  the  refresh  counter:  cnt  to  0. 

•  Query  stage: 

1.  C  and  S  perfonn  a  keyword  oblivious  transfer  step.  C’s  input  is  (i,  x)  (the  entire 
query),  and  S’s  input  is: 

{( [i,  xy],  [Ky ,  Lj])  |  0  <  i  <  I,  0  <  j  <  | Ai|,  xy  in  Ai}  U  Pd 
where 


Kij  =  {kh  |  rh  =  (Xh,0,  .  .  .  ,  Xij  ,  .  .  .  ,  Xh,l)} 


and 

Lj  =  {s_1(h)  |  rh  =  (xh,o,  .  .  .  ,  xy ,  .  .  .  ,  xh,i)} 
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and 


Pd  =  {([Ri,  XRi],  [kRi,  Ir:])  |  0  <  i  <  l|D|  —  S;=o  l-Aj  |  A  XRinotin  Ar) 


where  R  is  a  random  sequence  of  integers  between  0  and  I,  k  is  a  random  sequence  of 
elements  from  K,  and  I  is  a  random  sequence  of  sets  of  database  indices.  Here  the 

quantity  l|D|  -£'=0  |Aj  |  refers  to  the  number  of  additional  cells  that  must  be  added  to 
hide  the  distinct  number  of  attribute -value  pairs  in  the  database.  Note  the  assumption  that 
Kij ,  Lj ,  kRi ,  and  Ir.  are  padded  to  the  same  length.  The  condition  XRi  not  in  Ar.  is  to  ensure 
that  the  entries  added  for  padding  will  never  be  returned  as  the  result  of  the  KOT 
protocol.  The  purpose  of  including  Pd  in  the  server’s  input  is  to  prevent  leaking  the 
number  of  distinct  attribute-value  pairs  in  the  database.  At  the  end  of  the  KOT  protocol, 
C  receives  [Kij  ,  Lj]  such  that  (i,  xy)  =  (i,  x).  This  is  achieved  using  the  keyword  OT 
scheme. 

2.  C  asks  IB  for  the  encrypted  rows  listed  in  Lj . 

3.  IB  sends  C  the  ciphertexts  indexed  by  Lj ,  { Cs(s -yh))  |  s_1(h)  in  Lj} 

4.  C  can  now  use  the  elements  of  Kij  to  decrypt  those  returned  by  IB,  thus  attaining: 

{(xi,  .  .  .  ,  xt,  y)  in  D  |  Xi  =  x} 

5.  S  updates  the  refresh  counter:  cnt  cnt  +  1. 

•  Refresh  stage: 

1.  If  cnt  >  r,  then  set  cnt  0  and  re-execute  the  preprocessing  stage. 

3.4  Security  Properties 

It  can  be  shown  that  our  protocol  does  not  reveal  to  the  server:  (1)  the  attribute  over  which  the 
query  is  performed,  (2)  the  value  of  the  attribute  queried  for,  or  (3)  the  rows  of  the  database 

which  are  accessed  to  the  server.  (1)  and  (2)  follow  directly  from  the  guarantees  of  the  KOT 
protocol  that  we  use,  and  as  such  are  subject  to  the  same  assumptions  as  that  protocol  (namely, 
intractability  of  the  one-more-RSA-inversion  problem).  The  third  property  follows  from  the  fact 
that  rows  are  only  retrieved  (during  query  processing)  from  the  isolated  box.  Similarly  for  the 
client,  this  protocol  does  not  leak  any  information  aside  from  the  intended  result  -  the  rows  of 
the  database  that  match  the  client’s  query.  This  property  follows  from  the  guarantees  of  the  KOT 
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Figure  1:  Experimental  Setup.  (Courtesy  of  MIT-LL) 


protocol,  as  well  as  the  security  of  the  encryption  scheme  E.  As  such,  privacy  from  an  honest- 
but-curious  client  is  subject  to  the  same  conditions  on  security  as  the  KOT  protocol.  The  privacy 
of  D  from  the  isolated  box  follows  from  the  security  of  our  encryption  scheme  E.  However,  the 
isolated  box  may  learn  the  frequency  with  which  the  client  accesses  permuted  rows  of  the 
database.  Because  the  rows  are  permuted  randomly  before  being  sent  to  the  isolated  box,  an 
adversary  would  need  external  (semantic)  information  in  combination  with  this  frequency 
information  to  deduce  further  information  about  the  contents  of  the  database.  Furthennore,  the 
adversarial  utility  of  this  information  can  be  arbitrarily  reduced  by  requiring  the  server  to 
periodically  re-send  the  rows  to  the  isolated  box  using  a  fresh  permutation.  This  feature  is 

controlled  by  the  parameter  r;  low  values  of  r  cause  the  server  to  enter  the  refresh  phase  more 
often.  The  more  often  this  happens,  the  less  useful  the  information  learned  by  the  isolated  box 
becomes.  However,  each  such  refresh  comes  at  the  cost  of  substantial  network  overhead  (for 
large  databases),  in  addition  to  negative  cache  effects.  This  gives  the  protocol  a  tunable 
parameter:  the  refresh  frequency  offers  various  degrees  of  security  for  a  quantifiable  tradeoff  in 
efficiency. 

4  Lessons  Learned 

We  feel  that  one  noteworthy  aspect  of  our  work  in  private  infonnation  retrieval  is  that  we  were 
able  to  scale  to  the  requirements  given  by  IARPA  without  developing  any  new  cryptographic 
primitives  (in  fact,  as  we  discuss  in  Section  2,  the  performance  of  our  protocol  outperformed  the 
project  goals  by  more  than  an  order  of  magnitude).  Recall  that  we  rely  on  symmetric-key 
cryptography  to  efficiently  hide  data  as  it  travels  over  un-trusted  channels,  keyword  oblivious- 
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transfer  to  hide  the  client’s  query  from  the  server  while  retrieving  the  correct  set  of  rows,  and 
RSA  to  blind  data  within  the  keyword  oblivious-transfer  protocol.  Particularly  relevant  is 
keyword  oblivious-transfer:  the  high-level  functionality  of  this  primitive  parallels  that  of  private 
information  retrieval  so  closely  that  implementing  the  needed  functionality  is  merely  a  matter  of 
scale.  In  other  words,  one  could  use  keyword-oblivious  transfer  to  implement  private  infonnation 
retrieval,  without  further  modification,  were  its  performance  at  the  scale  mandated  by  IARPA 
acceptable.  Our  insight  was  to  use  keyword  oblivious-transfer  only  over  data  that  indexes 
relevant  entries  in  the  large  database;  when  the  client  and  server  finish  performing  keyword 
oblivious-transfer,  the  client  can  use  this  information  to  ask  the  isolated  box  for  the  full 
information  required  to  complete  the  private  information  retrieval  protocol. 

In  one  sense,  this  suggests  that  all  of  the  mathematical  tools  needed  to  realize  the  demanding 
functionality  of  private  infonnation  retrieval  have  existed  for  years.  We  see  this  as  further 
evidence  of  the  need  for  a  new  set  of  tools  that  compile  privacy-sensitive  programs  from  high- 
level  specifications  to  low-level  primitives  with  rigorously-proven  properties,  such  as  keyword- 
oblivious  transfer.  This  will  allow  applications  which  have  seemingly  novel  privacy 
requirements,  such  as  private  information  retrieval,  to  benefit  from  principles  developed  in  the 
software  engineering  community,  such  as  code  reuse,  abstraction  reuse,  and  low-level  code 
generation.  In  the  context  of  privacy-preserving  applications,  these  principles  have  strong 
implications  for  correctness,  as  code/abstraction  reuse  oftentimes  allow  correctness  proofs  to  be 
reused  without  loss  of  rigor.  Removing  the  need  to  manually  develop  new  correctness  proofs  for 
each  protocol  from  the  ground  up  is  a  major  advantage.  We  see  this  as  a  primary  advantage  over 
other  teams’  solutions:  re-using  existing  primitives  to  meet  project  requirements  increased  the 
clarity  of  our  protocol  description  and  correctness  proof. 

Our  original  proposal  was  based  on  the  concept  of  an  optimizing  compiler  for  privacy-preserving 
applications.  We  view  our  activities  with  the  private  information  retrieval  protocol  presented 
above  as  a  case  study  in  this  larger  effort.  This  project  has  provided  us  with  a  realistic 
application,  corresponding  evaluation  dataset,  and  third-party  testing.  Moving  forward,  we  will 
leverage  this  to  incorporate  the  abstractions  and  functionality  used  to  complete  the  project  into 
such  a  compiler. 

5  Implementation  and  Performance  Evaluation 
Implementation 

We  implemented  our  protocol  in  10,501  lines  of  C++  source  code  for  Linux.  We  use  SQLite  for 
back-end  database  processing,  as  it  is  lightweight,  easy  to  use,  and  highly  performant  in  the 
single-access,  read-only  setting.  Our  client  prototype  utilizes  multiple  threads  to  avoid  network 
and  encryption-related  bottlenecks.  One  thread  constantly  transfers  data  from  the  isolated  box 
over  the  network,  and  the  other  thread  continually  decrypts  and  displays  the  data.  For  most 
cryptographic  primitives,  we  utilized  the  OpenS SL  library,  including  256-bit  AES  to  store  an 
encrypted  copy  of  the  database  on  the  IB,  and  to  generate  secure  pseudorandom  numbers  for  key 
data  and  database  row  pennutations.  We  wrote  our  own  implementation  of  the  Kurosawa-Ogata 
keyword  oblivious  transfer  protocol,  using  1024-bit  RSA  keys. 
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Experimental  Setup 


We  evaluated  the  performance  of  our  prototype  experimentally.  We  loaded  and  ran  the  server 
and  isolated  box  components  onto  two  Dell  PowerEdge  servers,  matching  the  project 
specification.  The  client  was  run  on  a  Dell  Inspiron  1545  matching  the  project  specification.  All 
communications  took  place  over  a  local  gigabit  ethemet  network.  This  setup  is  depicted  in 
Figure  1. 

Dataset  and  Benchmarks 

The  data  that  was  used  to  perfonn  the  evaluation  was  provided  by  the  MIT-Lincoln  Labs  test 
team.  It  consists  of  two  components: 

•  A  synthetic  database  with  a  schema  corresponding  to  personnel  records  for  a  hypothetical 
company.  The  schema  has  50  components  arranged  in  a  flat  hierarchy,  and  100,000 
records  corresponding  to  non-existent  citizens  with  characteristics  that  fit  the  distribution 
found  in  2000  census  data.  The  total  size  of  this  database  is  approximately  60  gigabytes. 

•  514  database  queries  arranged  in  16  distinct  test  cases.  These  queries  correspond  to 
182,348  database  records,  selected  to  test  the  full  range  of  prototype  operation. 

Each  test  query  consists  of  a  SQL  SELECT  statement  over  a  single  attribute,  with  an  equality 
constraint  on  the  value  of  the  attribute.  For  example, 

SELECT  *  FROM  people  WHERE  state  =  'NY' 

To  test  different  aspects  of  prototype  perfonnance,  such  as  the  ability  to  quickly  begin  returning 
data  for  a  large  query,  the  query  attribute  is  varied  to  account  for  the  characteristics  of  the 
underlying  database.  For  example,  querying  sparse  attributes  allows  the  lookup  perfonnance  of 
the  prototype  to  be  evaluated,  without  excess  noise  due  to  large  result  set  transfer. 

On  average,  test  queries  produce  results  with  less  than  10%  of  the  records  in  the  database.  Test 
queries  were  broken  into  four  categories: 

•  Tiny  queries:  fewer  than  10  records. 

•  Small  queries:  between  10  and  1000  records. 

•  Medium  queries:  between  1000  and  10000  records. 

•  Large  queries:  greater  than  10000  records. 

The  total  benchmark  suite  contained  304  tiny  queries,  224  small  queries,  32  medium  queries,  and 
4  large  queries. 


Metrics  and  Goals 

The  experiments  tested  four  aspects  of  the  implementation:  correctness,  query  compilation  time, 
index  lookup  and  search  time,  and  retrieval,  decryption,  and  display  time. 

•  Correctness:  Because  the  requirements  of  PIR  are  stricter  than  those  for  traditional 
networked  data  retrieval,  it  is  conceivable  that  the  functionality  of  a  PIR  system  might 
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differ  from  a  traditional  system.  For  each  test  in  the  benchmark  suite,  we  ran  an  identical 
test  in  a  baseline  MySQL  installation  to  determine  a  baseline  truth.  We  then  checked  the 
contents  of  each  result  against  the  baseline  MySQL  results,  checking  that  both: 

1 .  The  PIR  prototype  returns  the  same  number  of  records  as  the  MySQL  installation. 

2.  Each  byte  of  each  decrypted  record  returned  by  the  PIR  prototype  matches  the 
corresponding  byte  in  the  corresponding  row  returned  by  the  MySQL  installation. 

•  Client  Query  Compilation  (CQC)  Time:  This  corresponds  to  the  period  of  time  needed 
on  the  client  to  encode  and  send  a  query  to  the  server. 

•  Index  Lookup  and  Search  (ILS)  Time:  This  corresponds  to  the  time  needed  for  the 
client,  server,  and  isolated  box  to  negotiate  the  PIR  protocol.  This  begins  when  the 
client’s  packet  is  first  received  by  the  server,  and  ends  when  the  server’s  first  result 
packet  is  sent. 

•  Retrieval,  Decryption,  and  Display  (RDD)  Time:  This  corresponds  to  the  time  needed 
for  the  server  to  transfer  all  results  to  the  client,  as  well  as  that  needed  by  the  client  to 
decrypt  and  display  the  results.  This  period  begins  when  the  client  outputs  the  first  byte 
of  the  query  result,  and  ends  when  all  results  have  been  displayed. 

Each  of  these  metrics  is  evaluated  for  each  test  query  in  the  benchmark  suite. 

6  Results 

Before  we  present  the  details  of  our  results,  we  note  that  IARPA  presented  a  number  of 
perfonnance  requirements  that  the  PIR  prototype  must  meet. 

1 .  The  average  index  lookup  and  search  time  must  be  less  than  60  seconds. 

2.  The  average  retrieval,  decryption  and  display  time  of  the  PIR  system  must  be  no  more 
than  a  factor  of  100  more  expensive  than  a  corresponding  baseline,  non-PIR  MySQL 
system. 

3.  The  PIR  system  must  take  less  than  24  hours  to  bring  the  entire  60  gigabyte  test  database 
online,  ready  to  answer  queries. 

We  are  happy  to  report  that  our  prototype  meets  and  exceeds  these  requirements  by  substantial 
margins.  To  summarize,  our  results  demonstrate  that  PIR  can  be  made  both  practical  and 
efficient.  In  particular: 

•  Bringing  server  and  isolated  box  online  is  relatively  inexpensive.  For  the  60  gigabyte 
dataset,  it  takes  approximately  three  hours  to  bring  all  data  online,  and  come  to  a  ready 
state  for  query  processing.  There  are  two  components  to  this  cost:  the  transfer  of 
pennuted  rows  between  the  server  and  isolated  box  (~2.5  hours),  and  pre-computing  the 
keyword  oblivious  transfer  dictionary  (-30  minutes). 

•  The  overhead  for  performing  keyword  oblivious  transfer  is  effectively  constant,  and 
nearly  negligible.  On  average,  for  the  full  60  gigabyte  dataset,  KOT  required  4  seconds. 
This  is  significant,  as  perfonning  KOT  constitutes  nearly  all  overhead  required  by  PIR 
over  standard  query  processing. 

•  Overall  PIR  query  processing  time  is  <2x  the  standard  MySQL  base  time.  For 
sufficiently  large  result  sets,  nearly  all  overhead  is  due  to  decryption  time.  This  time  can 
be  reduced  further  with  increased  parallelism  and  faster  encryption  primitives. 
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•  Our  prototype  returned  correct  results  for  all  tests:  both  the  number  of  records  and  the 
contents  of  each  record  matches  that  returned  by  the  baseline  MySQL  installation. 

A  sampling  of  our  results  is  displayed  in  Figure  2,  which  displays  the  query  processing  time  for 
our  PIR  prototype  versus  the  MySQL  base  time  over  tests  from  each  category.  The 
“unoptimized”  curve  corresponds  to  our  prototype  with  no  parallelism,  and  the  “optimized” 
curve  corresponds  to  the  implementation  with  one  additional  processing  thread. 


Figure  2:  PIR  Time  vs.  MySQL  base  time 

We  present  more  detailed  results  in  the  remainder  of  this  section.  In  each  plot,  when  a  curve  is 
fitted  to  the  data,  it  is  done  so  using  least-squares  regression.  The  coefficient  of  determination  for 
each  curve  is  labeled  RA2.  This  coefficient  takes  values  between  0  and  1;  values  near  1  indicate 
an  excellent  fit,  and  values  near  0  indicate  a  weak  correlation  between  the  curve  and  data. 
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Tiny  Queries 

The  tiny  test  set  consists  of  304  queries  that  return  less  than  ten  rows  from  the  test  database.  Our 
results  are  given  in  four  plots  below.  The  first  plot  corresponds  to  the  total  query  response  time; 
note  that  the  total  query  response  time  for  our  prototype  is  greater  than  that  for  the  default 
MySQL  installation.  The  second  plot  shows  the  client  query  compilation  time.  For  tiny  queries, 
the  query  compilation  time  of  our  prototype  consumes  nearly  all  of  the  total  query  response  time. 
This  reflects  the  fact  that  the  time  required  to  complete  keyword  oblivious-transfer  is 
independent  of  the  size  of  the  query  result.  So,  while  the  results  to  not  take  long  to  transmit  from 
the  isolated  box  to  the  server  (reflected  in  the  final  plot),  the  query  compilation  time  is  as 
expensive  as  it  would  be  for  larger  queries. 

While  it  may  seem  as  though  this  result  implies  that  our  protocol  is  not  well-suited  for  tiny 
queries,  observe  that  the  total  query  response  time  is  still  small.  We  feel  that  the  tradeoff  is 
justified,  given  the  scale  of  the  data  involved. 

Note  that  the  index  lookup  and  search  plot  indicates  no  time  needed  by  our  prototype.  This  is  due 
to  the  fact  that  client  query  compilation  and  index  lookup  or  search  correspond  to  the  same 
portion  of  our  protocol;  the  entire  time  is  accounted  for  in  the  client  query  compilation  data. 


Query  Response  Time  (total)  (tiny  queries) 
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Figure  3:  Response  Time  for  Tiny  Queries 
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Small  Queries 


The  small  test  set  consists  of  224  queries  that  return  between  ten  and  1000  records  from  the  test 
database.  Our  results  are  given  in  four  plots  below.  The  first  plot  corresponds  to  the  total  query 
response  time;  note  that  the  total  query  response  time  for  our  prototype  is  greater  than  that  for  the 
default  MySQL  installation,  but  only  by  a  constant  factor.  This  result  is  due  to  the  need  to 
decrypt  results  sent  from  the  isolated  box,  and  can  be  minimized  to  an  arbitrary  degree  with 
increased  parallelism  and  faster  hardware.  The  second  plot  shows  the  client  query  compilation 
time.  Note  that  the  query  compilation  time  for  small  queries  falls  in  the  same  range  as  for  tiny 
queries.  This  reflects  the  fact  that  keyword  oblivious  transfer  time  is  independent  of  the  query 
result  size.  Rather,  it  scales  linearly  in  the  number  of  records  in  the  database,  and  the  amount  of 
value-repetition  among  the  entries. 
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Figure  4:  Response  Time  for  Small  Queries 
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Medium  Queries 


The  medium  test  set  consists  of  32  queries  that  return  between  1000  and  10  000  records  from  the 
test  database.  Our  results  are  given  in  four  plots  below.  The  first  plot  corresponds  to  the  total 
query  response  time.  As  in  the  previous  set  of  tests  cases,  our  prototype’s  total  query  response 
time  differs  from  the  baseline  MySQL  time  by  a  constant  factor.  This  effect  can  be  minimized  to 
an  arbitrary  degree  using  the  same  methods  discussed  previously.  The  second  plot  shows  the 
client  query  compilation  time.  Note  that  the  query  compilation  time  for  small  queries  falls  in  the 
same  range  as  for  tiny  queries  and  small.  This  reflects  the  fact  that  keyword  oblivious  transfer 
time  is  independent  of  the  query  result  size.  However,  for  result  sets  of  this  size,  query 
compilation  corresponds  to  a  much  smaller  portion  of  the  overall  response  time,  likewise 
mirrored  in  the  striking  similarities  between  the  first  and  fourth  plots  below.  At  this  point,  the 
time  needed  to  complete  the  keyword  oblivious-transfer  protocol  is  effectively  marginalized  by 
the  time  needed  to  transfer  and  decrypt  the  large  amount  of  result  data. 
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Figure  5:  Response  Time  for  Medium  Queries 
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Large  Queries 


The  large  test  set  consists  of  4  queries  that  return  more  than  10  000  records  from  the  test 
database.  Recall  that  the  entire  test  database  is  60  gigabytes,  so  each  query  in  the  large  set 
corresponds  to  between  five  and  ten  gigabytes  of  response  data.  Our  results  are  given  in  four 
plots  below.  The  first  plot  corresponds  to  the  total  query  response  time.  As  in  the  previous  set  of 
tests  cases,  our  prototype’s  total  query  response  time  differs  from  the  baseline  MySQL  time  by  a 
constant  factor.  The  second  plot  shows  the  client  query  compilation  time.  Note  that  the  curve  that 
fits  these  data  points  indicates  a  correlation  between  result  size  and  query  compilation  time.  This 
is  almost  surely  a  spurious  effect  of  the  small  number  of  sample  points;  there  is  no  reason  to 
believe  that  query  compilation  time  should  differ  at  all  from  the  other  test  sets.  As  with  the 
medium  tests,  query  compilation  corresponds  to  a  small  portion  of  the  overall  response  time. 
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Figure  6:  Response  Time  for  Large  Queries 
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7  Activities 


We  performed  the  following  activities  throughout  the  course  of  this  project. 

1.  Developed  initial  protocol,  which  was  based  on  pre-computing  all  possible  database  views 
for  a  particular  attribute,  and  transferring  them  to  the  isolated  box  on-demand.  This  gives  the 
isolated  box  a  coarser  view  of  database  accesses,  but  is  not  nearly  as  performant  as  the 
approach  discussed  above. 

2.  Modified  initial  protocol  to  work  over  rows  rather  than  views,  to  arrive  at  the  approach 
discussed  above. 

3.  Implemented  the  primitives  needed  by  the  protocol,  including  the  Kurosawa-Ogata  keyword- 
oblivious  transfer  protocol,  the  RSA  algorithm,  and  the  needed  operations  over  large 
integers. 

4.  Implemented  a  prototype  of  the  client,  server,  and  isolated  box  in  C++. 

5.  Integrated  the  prototype  with  the  MIT-LL  test  harness. 

6.  Installed  and  configured  the  hardware  infrastructure  needed  to  run  the  performance 
evaluation. 


8  Conclusion 

We  demonstrated  an  efficient  protocol  to  perform  keyword  search  in  a  privacy-preserving  way.  It 
was  a  very  rewarding  experience  for  our  team.  However,  we  want  to  pursue  several  future 
directions  for  this  project.  On  the  fundamentals  side  we  want  to  explore  support  for  more 
expressive  queries.  We  want  to  see  whether  there  are  even  more  opportunities  for  making  the 
protocol  more  efficient.  University  of  Wisconsin  has  a  very  strong  database  group.  We  plan  to 
collaborate  with  the  database  group  to  find  more  applications  of  the  PIR  technology.  We  are  very 
excited  about  further  opportunities  on  this  project. 

9  Acronyms 

AES  -  Advanced  Encryption  System 
CQC  -  Client  Query  Compilation 

IARPA  -  Intelligence  Advance  Research  Projects  Agency 
ILS  -  Index,  Lookup  and  Search 
KOT  -  Keyword  Oblivious  Transfer 

MIT-LL  -  Massachusetts  Institute  of  Technology  Lincoln  Labs 
PIR  -  Private  Infonnation  Retrieval 
RSA  -  Rivest,  Shamir  and  Adleman 
SSL  -  Secure  Socket  Layer 
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Appendix  A:  SSH  Password  Authentication  Using  Secure  Function  Evaluation 

The  following  is  an  unpublished  manuscript  containing  research  done  under  the  PIR  contract. 
The  authors  are  Louis  Kruger,  Matthew  Fredrikson,  Somesh  Jha,  and  Vitaly  Shmatikov,  all 
supported  by  the  contract  for  the  duration  of  this  research. 
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SSH  Password  Authentication  Using  Secure  Function  Evaluation 


Abstract 

Over  the  years.  SSH  has  evolved  from  a  secure  alternative  to  telnet  into  a  robust  and  extensible 
protocol  that  can  serve  as  a  secure  transport  layer  for  applications  that  need  strong  cryptographic  se¬ 
curity.  Interactive  password-based  authentication,  however,  remains  one  of  the  most  popular  choices 
for  common  SSH  deployments,  leaving  opportunity  for  an  active  malicious  adversary  to  either  learn 
the  client’s  password,  or  impersonate  the  server  or  the  client.  Password-Authenticated  Key  Exchange 
(PAKE)  protocols  are  designed  to  he  resilient  to  these  attacks.  Unfortunately,  PAKE  protocols  are 
impractical  for  many  common  deployment  scenarios  (<?.,?.,  they  are  not  backwards-compatible  with 
legacy  password  storage  schemes,  requiring  users  to  re-establish  their  passwords'). 

We  present  a  practical  protocol  that  provides  equivalent  guarantees  to  existing  PAKE  protocols, 
but  is  suitable  as  a  “drop-in’’  authentication  module  in  existing  deployments  for  SSH,  as  well  as  other 
password-authenticated  Internet  services.  To  accomplish  this,  we  use  secure  function  evaluation 
(SFE)  to  compare  the  password  credentials  between  the  client  and  server,  embedding  computation 
of  the  hash  function  into  the  comparison  protocol.  We  have  implemented  an  SSH  client  and  server 
that  use  our  scheme  and  released  an  open-source  version  of  the  software  that  is  freely  available  for 
download. 


1  Introduction 

Originally  designed  as  a  secure  alternative  to  telnet.  SSH  has  since  evolved  into  a  layered  protocol  that 
serves  as  the  secure  transport  layer  over  which  many  other  protocols  execute.  This  functionality  has 
simplified  the  task  of  providing  cryptographic  security  to  applications  that  need  it.  Unfortunately,  it  has 
also  encouraged  SSH  deployment  in  settings  where  strong  authentication  mechanisms  arc  not  available, 
or  worse  yet,  take  a  backseat  to  more  convenient  methods  such  as  interactive  password  login,  which 
is  a  significant  problem.  Casual  .SSH  users  may  assume  that  the  mere  presence  of  SSH  guarantees 
security,  unaware  of  the  risks  associated  with  password  authentication  and  improper  use  of  puhlie-key 
cryptography  1 37 1.  It  is  difficult  for  a  human  user  to  verify  authenticity  of  the  server’s  public  key  from 
a  hexadecimal  fingerprint:  in  Section  3.1,  wc  describe  a  man-in-thc-middlc  attack  on  SSH  password 
authentication  which  exploits  this  fact.  However,  we  note  that  this  is  not  a  new  attack  and  was  known 
before.  This  problem  is  not  limited  to  SSH.  but  also  affects  other  Internet  services  relying  on  password 
authentication. 

We  designed  and  implemented  a  practical,  yet  cryptographically  secure  protocol  for  password-based 
authentication  and  key  establishment  in  SSH.  Even  dtough  we  use  our  protocol  in  the  context  of  SSH, 
our  technique  can  be  applied  to  any  scenario  where  password-based  authentication  is  necessary.  An 
implementation  of  our  protocol  is  available  at  ANONYMIZED.  Our  protocol  satisfies  three  important 
requirements. 

(1)  Compatible  with  legacy  infrastructure.  Our  protocol  is  compatible  with  existing  password  au¬ 
thentication  infrastructures.  It  docs  not  require  any  changes  to  legacy  servers  beyond  upgrading 
tlic  SSH  software  and  is  thus  deployable  in  common  settings.  The  use  of  cryptographic  hash 
databases  to  store  passwords  is  common  practice  on  both  Unix  and  Windows  systems  [34].  Typi¬ 
cal  Linux  systems  (current  versions  of  Ubuiitu  [35],  RedHat  1 33  |.  and  Debian  [  111)  typically  use 
either  MD5,  or  SHA-512  hash  function,  with  salts  and  iterated  rounds  for  added  security  against 
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offline  brute-force  attacks.  Current  versions  of  Windows  use  a  proprietary  NT  Hash  technol¬ 
ogy  [3 1 1,  but  the  principle  is  identical.  Our  protocol  is  specifically  designed  Lo  support  sLoragc  of 
passwords  in  hashed  form. 

By  contrast,  other  solutions  for  password-authenticated  key  exchange  require  users  to  rc-gcncrate 
passwords,  which  greatly  limits  their  deployability.  They  cannot  be  installed  on  legacy  servers 
with  large  existing  user  bases.  Some  also  require  additional  information  to  be  stored  on  the  server 
or  assume  the  existence  of  public-key  infrastructure  (PKI). 

(2)  Does  not  decrease  security  of  password  storage.  At  the  very  least,  the  password  authentication 

mechanism  should  not  provide  weaker  security  guarantees  than  the  current  system,  in  which  users' 
passwords  are  stored  on  the  server  in  hashed  form.  If  the  server  stores  passwords  in  the  clear,  a 
compromise  of  the  server  will  reveal  the  passwords  of  all  users.  Even  without  an  external  attack, 
a  malicious  server  operator  may  impersonate  a  user  in  other  authentication  domains. 

Our  protocol  takes  as  inputs  the  password  from  the  user  and  the  hashed  password  from  the  server 
(it  is  essential  that  the  user's  input  be  the  actual  password  and  not  a  hash;  otherwise,  a  malicious 
server  could  impersonate  the  user).  Therefore,  from  the  viewpoint  of  password  security,  it  is  as 
strong  as  existing  solutions,  while  providing  significantly  more  protection  against  man-in-lhc- 
middlc  atLacks. 

(3)  Enables  derivation  of  a  secure,  shared  cryptographic  key.  Our  protocol  enables  the  user  and  the 

server  to  derive  a  shared  cryptographic  key(s)  which  can  be  used  to  protect  their  subsequent  com¬ 
munications.  Tire  key  remains  secure  (/.<?.,  indistinguishable  from  random)  even  in  the  presence 
of  a  malicious  nian-in-ihc-niiddlc  adversary.  Unlike  existing  methods  for  password  authentication 
in  SSH,  our  protocol  does  not  require  the  user  to  check  the  validity  of  the  server's  public  key  by 
manually  verifying  its  fingerprint  (wc  argue  that  this  requirement  is  largely  ignored  in  practical 
deployment  scenarios). 

Against  an  active  adversary,  the  protocol  is  as  secure  as  can  be  hoped  for  in  the  ease  of  password- 
based  authentication.  It  docs  not  leak  any  information  except  the  outcome  of  an  authentication 
attempt.  /.<?.,  for  any  given  password,  the  adversary  can  check  whether  the  password  is  correct. 
Brute-force  password-cracking  remains  feasible,  but  every  attempt  requires  executing  an  instance 
of  die  protocol. 

Exploiting  the  special  features  of  password  authentication.  Our  protocol  uses  Yao's  “garbled  cir¬ 
cuits'"  protocol  for  secure  function  evaluation  (SFE)  as  a  basic  building  block.  SFE  is  used  to  compute 
the  hash  of  the  SSH  client's  password  and  compare  it  for  equality  with  the  hash  value  provided  by  the 
SSH  server. 

Yao's  original  protocol  is  only  secure  against  passive  or  semi-honest  adversaries  [27,39],  /. e. ,  if  all 
participants  faithfully  follow  the  protocol.  This  model  is  clearly  unsuitable  for  SSH.  which  must  be 
secure  even  if  one  of  the  participants  maliciously  deviates  from  the  protocol  specification.  This  includes 
the  case  when  a  malicious  SSH  client — who  constructs  the  garbled  circuits  in  our  protocol — deliberately 
creates  a  faulty  circuiL  in  an  attempt  to  learn  Lhe  server's  input  into  the  protocol.  For  example,  the  client 
may  put  malformed  ciphertexts  into  the  rows  of  the  garbled  truth  tabic  which  will  only  be  evaluated 
when  a  certain  input  bit  from  the  server  is  equal  to  “l."’  and  correct  ciphertexts  into  the  rows  which  will 
be  evaluated  when  this  bit  is  equal  to  “0."  By  observing  whether  the  server's  evaluation  of  diis  circuit 
fails  or  not.  the  malicious  client  can  learn  the  value  of  the  bit  in  question.  The  malicious  client  may 
also  submit  a  circuit  which  computes  something  other  than  the  hash-and-chcck-for-cqualily  function 
required  by  SSH  authentication. 

Yao's  protocol  can  be  modified  to  achieve  security  against  malicious  participants — either  via  cut- 
and-choose  techniques  [26,  36],  or  via  special-purpose  zero-knowledge  proofs  [22 1  which  enable  the 
server  to  verify  diat  the  circuit  is  well-formed — but  the  resulting  constructions,  while  more  efficient  than 
generic  transformations,  arc  still  too  expensive  for  practical  use. 
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Our  SFE-based  construction  in  this  paper  exploits  the  special  structure  of  the  authentication  problem 
in  a  fundamental  way.  The  purpose  of  the  password  authentication  subprotocol  in  SSH  is  to  compute  a 
single  biL  for  the  client:  whether  the  hash  of  the  password  submitted  by  the  client  is  equal  to  the  value 
submitted  by  the  server  or  not.  The  standard  cut-and-choosc  construction  for  SFE  in  the  malicious  model 
requires  that  the  server  evaluate  several  garbled  circuits  submitted  by  the  client  and  the  majority  of  them 
must  be  correct  |26|.  In  the  context  of  password  authentication  for  SSH,  it  is  sufficient  that  a  single 
circuit  be  correct.  Even  if  all  but  one  circuits  evaluated  by  the  server  are  faulty,  a  malicious  client  does 
not  learn  any  more  than  he  would  have  been  learned  simply  by  submitting  a  wrong  password. 

Our  key  observation  is  that  to  prevent  a  malicious  client  from  authenticating  without  the  correct 
password,  it  is  sufficient  for  the  SSH  server  to  cither  (a)  detect  that  one  of  the  circuits  submitted  by 
the  client  is  incorrect,  or  (b)  evaluate  at  least  one  correct  circuit.  In  other  words,  the  SSH  server  either 
detects  the  client's  misbehavior,  or  rejects  the  client's  candidate  password  because  its  hash  does  not 
match  the  server’s  value.  In  either  case,  authentication  attempt  is  rejected. 

We  prove  rhe  security  of  our  protocol  against  malicious  clients  in  a  (modified)  covert  model  of  secure 
computation  [5,20].  Security  in  the  covert  model  guarantees  that  any  deviation  front  the  protocol  will 
be  detected  with  a  high  probability.  In  our  proof,  instead,  wc  show  that,  with  high  probability,  either  the 
deviation  is  detected,  or  the  protocol  computes  the  same  value  as  it  would  have  computed  had  the  client 
behaved  correctly.  Security  in  this  model  can  be  achieved  at  a  lower  cost  than  “standard”  security  against 
malicious  participants,  enabling  significant  performance  gains  for  our  implementation  viz.  off-the-shelf 
SFE. 

Security  of  an  honest  client  against  a  malicious  SSH  server  follows  directly  from  the  security  of  the 
underlying  oblivious  transfer  (OT)  protocol  against  malicious  choosers,  since  die  server's  input  into  die 
protocol  is  limited  to  his  acting  as  a  chooser  in  the  OT  executed  as  part  of  Yao's  protocol.  While  the 
server  can  always  perform  a  denial-of-service  aLlack  by  refusing  to  communicate  Lhe  result  of  authenti¬ 
cation  to  the  client,  this  is  inevitable  in  any  client-server  architecture. 

The  protocol  is  secure  against  replay  attacks,  since  a  man  in-thc-middlc  eavesdropper  on  an  instance 
of  the  protocol  docs  not  learn  anything  about  the  client’s  input  (password),  server's  input  (password 
hash),  or  the  shared  key  established  by  die  client  and  the  server.  Furthermore,  we  show  diat  even  if  a 
man-in-the-middle  attacker  tampers  with  the  protocol  execution,  he  does  not  learn  more  than  he  would 
have  learned  simply  by  attempting  to  authenticate  with  a  wrong  password. 

PAKE  protocols.  Bcllovin  and  Merritt  pioneered  a  class  of  protocols  that  use  the  client  password  as  a 
shared  secret  for  mutual  authentication  [  7  ] .  These  protocols,  commonly  referred  to  as  PAKE  (Pas  sword  - 
Authenticated  Key  Exchange),  are  resistant  to  the  password  compromise  scenario  described  above,  even 
when  the  client  is  communicating  directly  with  a  malicious  impersonator.  Furthermore,  these  protocols 
alert  the  client  to  the  presence  of  an  impersonator,  allowing  die  SSH  user  to  cut  further  communications 
in  high-risk  situations.  However,  existing  PAKE  protocols  arc  difficult  to  deploy  in  many  settings, 
especially  when  legacy  servers  and  legacy  hashed-password  lilcs  arc  involved  (see  Section  2). 

In  this  paper,  we  present  the  first  password-based  authentication  and  key  establishment  protocol  to 
satisfy  the  three  design  principles  listed  above.  We  show  that  the  secure  password  storage  and  the  secure 
key  establishment  requirements  can  he  achieved  by  comparing  the  authentication  credentials  of  die  user 
and  the  server  using  secure  function  evaluation  (SFE)  [38],  in  a  lcgacy-compatiblc  manner. 

The  main  insight  that  enables  backward  compatibility  with  existing  infrastructures  is  that  SFE  gives 
the  protocol  complete  flexibility  to  compute  arbitrary  hash  functions  while  performing  authentication. 
This  makes  our  protocol  suitable  as  a  “drop-in”  authentication  module  in  most  legacy  environments, 
requiring  only  that  Lhe  server  and  clienL  software  be  updated  to  use  the  new  protocol. 

Organization  of  the  paper.  In  Section  2,  wc  discuss  related  work,  and  explain  why  existing  PAKE 
protocols  arc  not  suitable  for  SSH  in  terms  of  the  three  requirements  listed  in  the  introduction.  In 
Section  3,  we  present  a  technical  overview  of  our  problem  setting,  as  well  as  our  proposed  solution.  In 
Section  4,  wc  describe  the  design  and  implementation  of  our  scheme,  and  in  Section  5  wc  evaluate  it. 
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Figure  1:  A  comparison  of  existing  PAKE  protocols.  The  protocols  listed  arc  EKE  [7],  AEKE  [8], 
Q-method  [  17],  and  Multiple-Server  [  12].  There  are  a  number  of  protocols  in  the  literature  similar  in 
nature  to  EKE  and  AEKE;  these  are  referenced  in  the  text  but  left  out  of  this  table  for  the  sake  of  clarity. 

2  Related  Work 

Password-Authenticated  Key  Exchange  (PAKE)  is  a  class  of  password-based  authentication  protocols 
designed  to  be  secure  even  against  active  adversaries.  There  arc  many  PAKE  protocols  in  the  literature. 
The  first  PAKE  protocol  was  described  by  Bcllovin  and  Merritt  as  Encrypted  Key  Exchange  (EKE)  [7], 
EKE  allows  two  parties  to  communicate  securely  using  a  weak  secret,  such  as  a  human-memorable 
password.  The  authors  observed  that  a  standard  symmetric  cryptosystem  keyed  on  the  weak  secret  does 
not  provide  strong  security,  and  instead  proposed  to  use  a  temporary  asymmetric  key  pair  to  exchange 
a  stronger  symmetric  key  for  use  in  the  rest  of  the  session  (unlike  passwords,  bilsirings  used  as  keys 
in  common  asymmetric  schemes  arc  essentially  random  and  are  thus  difficult  to  verify  by  brute-force 
analysis  of  protocol  messages).  EKE  provides  basic  mutual  authentication  and  satisfies  our  requirement 
(3).  It  docs  not  satisfy  f  l )  or  (2),  because  in  standard  deployment  for  password  authentication,  the  server 
stores  only  the  hashes  of  clients'  passwords.  A  number  of  subsequent  protocols  have  the  same  properties 
in  terms  of  our  requirements  1 1-3, 6,9, 1 4- 16, 24, 25, 28. 42 1. 

Bcllovin  and  Merritt  developed  Augmented  Encrypted  Key  Exchange  (AEKE)  [8]  to  enable  the  server 
Lo  store  only  the  hash  of  the  password,  to  protect  the  latter  in  ease  the  password  database  is  compromised 
by  an  adversary.  To  prevent  impersonation  of  the  client  by  such  an  adversary,  the  protocol  uses  schemes 
which  allow  one  party  to  verify  that  the  other  knows  both  the  password  and  its  hash.  The  first  scheme 
uses  a  class  of  commutative  one-way  hash  functions.  However,  there  are  no  known  hash  functions  with 
the  information-hiding  properties  required  lo  guarantee  the  security  of  the  protocol,  making  this  scheme 
of  theoretical  interest  only.  The  second  scheme  uses  the  hashed  password  stored  by  the  server  as  the 
public  key  in  a  digital  signature  scheme.  To  prove  his  knowledge  of  the  password,  the  client  signs 
the  session  key  with  the  corresponding  private  key.  AEKE  implemented  with  this  scheme  satisfies  our 
requirements  (2)  and  (3).  Unfortunately,  using  passwords  stored  on  the  server  as  signature  keys  requires 
substantial  changes  to  the  authentication  infrastructure  and  violates  requirement  (1). 

Gentry  et  al.  [17]  proposed  the  £2- method  for  adding  security  against  server  compromise  to  an  arbi¬ 
trary  PAKE  protocol.  However,  all  known  feasible  implementations  of  their  method  require  the  server 
to  store  additional  information,  namely  a  publie/private  key  pair  with  the  secret  key  encrypted.  Thus, 
applying  this  method  to  one  of  the  previously  described  protocols  will  result  in  a  set  of  implementation 
constraints  basically  equivalent  to  AEKE  [8],  and  ultimately  fail  to  satisfy  our  requirement  (1). 

Ford  and  Kaliski  [12]  presented  a  PAKE  protocol  that  protects  the  secrecy  of  the  client's  password 
against  server  compromise  by  distributing  it  among  many  servers.  When  the  elienL  authenticates,  it  in¬ 
teracts  with  each  server  to  establish  a  set  of  strong  secrets,  after  which  the  servers  collaborate  to  validate 
the  client’s  identity.  A  number  of  subsequent  protocols  adopt  this  basic  functionality;  MacKcnzic  et  al. 
generalize  the  protocol  to  a  threshold  setting  [29],  and  Brainard  et  at.  present  a  lightweight  protocol  that 
reduces  the  computation  load  on  the  client  1 10|.  While  these  protocols  satisfy  our  security  requirements 
{(2)  and  (3)),  they  have  the  obvious  drawback  of  requiring  a  specific  server-side  architecture  that  may 
not  be  common  in  many  sellings,  and  thus  fail  to  satisfy  requirement  ( /). 
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3  SSH  Protocol  Overview 

The  SSH  protocol  enables  secure  network  services,  including  remote  login  and  traffic  tunneling,  over 
insecure  networks  such  as  the  Internet  [40].  The  functionality  of  the  protocol  is  partitioned  into  three 
layers,  with  each  layer  defined  in  terms  of  messages  from  the  layer  beneath  it. 

•  The  transport  layer  provides  privacy,  integrity,  and  server  authentication  for  the  user  authentica¬ 
tion  protocols,  as  well  as  the  application  connection  protocols,  running  on  top  of  it.  In  shon,  it 
provides  the  layers  above  it  with  a  plaintext  interface  for  sending  encrypted  packets  reliably  over 
the  network. 

•  Tire  user  authentication  layer  authenticates  the  client  to  the  server.  In  keeping  wiLh  the  mod¬ 
ular  design  of  the  protocol,  this  layer  is  extensible  to  a  number  of  authentication  mechanisms. 
The  specification  for  this  layer  includes  public  key,  password,  and  host-based  authentication  sub- 
protocols  [41  ].  However,  the  majority  of  deployments  use  password-based  authentication  for  us 
convenience  and  simplicity. 

•  The  connection  layer  multiplexes  many  distinct  communication  channels  over  the  SSH  transport 
layer.  Several  channel  types  have  been  defined  for  various  applications,  including  terminal  shell 
channels  for  remote  login,  and  traffic  forwarding  channels  for  encrypted  tunnels. 

A  typical  SSH  session  proceeds  by  working  through  these  layers  in  sequence:  first,  the  SSH  transport 
layer  is  established,  after  which  the  user  is  able  Lo  securely  authenticate  to  the  server,  and  finally  Lhe 
application-specific  connections  arc  initiated  over  the  transport  layer. 

Session  Initialization:  To  establish  the  SSH  transport  layer,  the  server  and  client  must  ( I)  perform  a 
key  exchange  to  establish  a  shared  secret  that  is  used  to  encrypt  future  communications,  and  (2)  validate 
the  server's  key,  to  prevent  a  man-in-the-middle  attack.  Tire  SSH  specification  includes  a  single  Diffie- 
Hcllman  group  for  key  exchange  |40|.  although  later  proposals  have  extended  this  layer  lo  allow  new 
Diffie-Hellman  groups  Lo  be  added  as  needed  1 1 3  ] .  As  described  in  Section  I ,  the  host's  key  is  validated 
by  querying  the  user.  Thus,  this  layer  is  responsible  for  the  vulnerability  described  in  Section  1 . 

User  Authentication:  When  the  SSH  transport  layer  has  been  established,  the  client  and  server  have 
a  secure  channel  over  which  they  can  communicate,  and  the  server  has  supposedly  been  authenticated 
to  the  client.  However,  most  applications  require  the  client  to  authenticate  to  the  server.  The  user  au¬ 
thentication  layer  handles  this  in  an  extensible  way.  by  defining  a  set  of  messages  that  can  be  used  to 
relay  general  authentication  data.  The  specification  describes  several  mechanisms  for  authentication, 
including  the  uscrnamc/password  method  familiar  to  all  users  of  SSH,  as  well  as  public  key-based  au¬ 
thentication  [41 1.  However,  as  long  as  the  server  and  client  software  can  agree  on  an  authentication 
method,  it  is  straightforward  to  extend  this  layer  to  use  new  mechanisms  that  provide  better  security. 
For  example,  Yang  and  Shieh  proposed  Lhe  use  of  smart  cards  for  authentication  |37|.  which  has  subse¬ 
quently  been  implemented  in  aL  least  one  SSH  software  package  [32].  Our  proposed  protocol  fits  into 
the  SSH  protocol  in  this  layer. 

When  password  authentication  is  used,  the  protocol  proceeds  as  follows  (depicted  in  Figure  2): 

1 .  The  client  sends  to  the  server  a  message  of  type  SSHjytSG_USERAUTH_REQUEST.  containing  the 
username  and  password  given  by  the  user. 

2.  Based  on  the  contents  of  the  SSH-MSG-USERAUTH.REQUEST,  the  server  responds  to  the  client 
in  one  of  two  ways: 

•  If  password  authentication  is  disallowed,  or  the  uscrnamc/password  combination  supplied  is 
incorrect,  then  the  server  responds  with  an  SSH_MSG_USERAUTH_FAILURE  message. 

*  If  password  authentication  is  allowed,  and  the  uscrnamc/password  combination  is  valid,  then 
the  server  responds  with  an  SSH_MSG.USERAUTH.SUCCESS  message. 
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Figure  3:  Man-in-lhc-middlc  atLack  on  the  SSH  authentication  protocol  using  Difiic-Hcllnian  key  ex¬ 
change  as  described  in  the  SSH-2.0  spec  ill  cation.  Using  this  attack,  an  adversary  can  learn  the  client’s 
password,  eavesdrop  on  all  communications  between  client  and  server,  and  impersonate  the  server. 

3.  If  the  server  sends  SSH_MSG_USERAUTH_SUCCESS,  then  Lhe  elienL  has  successfully  authenti¬ 
cated  and  may  begin  requesting  services.  Otherwise,  the  protocol  terminates. 

3.1  Man-in-the-middle  attack  on  conventional  SSH  password  authentication 

When  SSH  password  authentication  is  used,  the  client  and  server  first  negotiate  an  encrypted  tunnel, 
over  which  the  client  sends  die  password  for  verification.  If  an  attacker  was  somehow  able  to  eavesdrop 
on  this  encrypted  tunnel,  the  password  itself  would  be  revealed  to  the  attacker,  who  would  then  he  able 
Lo  impersonate  the  client  at  will.  A  man-in-lhe-middle  attack  on  the  encrypted  tunnel,  if  successful, 
would  allow  such  a  password  interception.  To  prevent  such  an  attack,  SSH  relics  on  host  keys  [41]  to 
authenticate  the  server.  However,  host  keys  alone  do  not  entirely  solve  the  problem,  as  it  is  necessary  lo 
authenticate  each  server  key  w'hen  a  session  is  initiated.  Under  certain  circumstances,  an  attacker  may 
still  be  able  to  mount  a  successful  attack.  Figure  3  depicts  this  situation  when  the  Diffie-Hellman  key 
exchange  is  used: 

1 .  After  the  client  and  server  agree  on  a  key  exchange  protocol,  the  client  attempts  to  send  the  public 
exponentiated  integer  P,  to  the  client. 

2.  The  attacker  intercepts  Pc,  replaces  it  with  a  value  known  to  him,  Pu.  and  sends  it  to  the  client. 
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3.  The  server  sends  his  public  integer,  along  with  a  host  key  that  is  supposed  to  prove  his  identity: 

P,  II  AV 

4.  The  attacker  intercepts  Ps  ||  Ks  and  sends  Pa  ||  Ka  to  the  server. 

5.  Using  the  exchanged  public  keys,  the  attacker  constructs  separate  shared  secrets  Sr  and  Ss  with 
the  client  and  server,  to  be  used  for  further  communications. 

6.  The  client's  software  hashes  the  key  that  it  receives,  K„ ,  and  checks  a  local  keystore  to  see  if  the 
hash  is  recognized.  If  the  client  docs  not  have  the  real  servers 's  public  key  K„  in  his  keystore, 
then  the  client  software  asks  the  user  to  verify  the  server  key's  authenticity: 

The  authenticity  of  host  'server  (1.2. 3. 4)'  can’t  be 
established.  RSA  key  fingerprint  is 
3f:76:22:43:c2:03:b9:71:b0:31:ce:87:37:45:cb:02. 

Are  you  sure  you  want  to  continue  connecting  (yes/no)? 


On  the  other  hand,  even  if  the  user  knows  the  correct  server  key.  he  may  assume  Lhe  key  has 
changed  for  a  non-malicious  reason,  such  as  a  software  upgrade,  and  allow  the  connection  to 
proceed. 

7.  The  user  validates  the  authenticity  of  the  key  based  on  its  hexadecimal  fingerprint,  thereby  mis¬ 
takenly  asserting  that  Lhe  attacker  is  the  authentic  server. 

8.  The  client  attempts  to  send  SV(SSH-MSG-USERAUTH)  to  the  server,  containing  login  credentials 
encrypted  with  the  shared  secret  Sr.  In  this  case,  the  credentials  consist  of  a  username  and  pass¬ 
word. 

9.  The  attacker  receives  St(SSHJyLSG_USERAUTH),  and  is  able  to  decrypt  it  to  read  the  password  in 
clear  text. 

Critical  to  the  success  of  this  attack  is  that  the  user  validates  the  authenticity  of  the  attacker's  public  key 
as  the  server's,  in  step  (7),  cm  action  which  we  assert  is  highly  probable.  Any  OpcnSSH  user  is  familiar 
with  the  message  displayed  in  step  (6)  -  according  to  the  SSH  protocol  RFC,  the  fingerprint  “...can 
easily  be  verified  by  using  telephone  or  other  external  communication  channels.”  [40]  Not  surprisingly, 
recent  research  has  indicated  Lhat  one  can  expect  the  average  user  to  simply  click  through  this  dialog 
without  going  to  such  trouble  |4|.  thus  accepting  the  attacker's  key;  this  undermines  the  very  purpose  of 
presenting  host  key  fingerprints  to  the  user.  Although  the  designers  of  the  SSH  protocol  were  aware  of 
this  problem  when  they  released  the  specification,  they  assumed  lhat  widespread  future  PKr  deployment 
would  make  it  unimportant  [40], 

4  Protocols 

We  need  a  protocol  that  provides  the  following  functionality:  given  the  client’s  input  x  (presumably  the 
password)  and  the  server's  input  y  (presumably  hash  of  the  password),  the  two  parties  would  like  to 
jointly  compute  whether  H(x)  =  y ,  for  some  hash  function  H .  In  other  words,  we  need  to  a  protocol 
for  the  following  functionality: 


(:i\y)  i - *  (${H(x).y).6{H{x),y)) 

Where  6(aJ>)  is  equal  to  1  if  n  =  b,  otherwise  iL  is  0.  The  first  quesLion  we  must  answer  is:  under 
which  model  should  our  protocol  be  secure?  In  the  semi-honest  model,  the  adversaries  follow  the 
correct  protocol,  but  might  try  to  infer  additional  information  from  the  messages  exchanged  during  the 
protocol.  The  classic  protocol  presented  by  Yao  |39|  can  be  used  to  produce  a  protocol  for  our  problem 
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that  is  secure  in  the  semi-honest  model.  An  extensive  treatment  of  Yao's  protocol  along  with  a  proof 
of  correctness  is  given  in  [27].  However,  the  semi-honest  model  is  not  suitable  in  our  context,  because 
SSH  is  frequently  used  over  wide-area  networks  (WAN)  where  we  cannot  expect  the  parties  Lo  obey  the 
semi-honest  model. 

In  the  malicious  model  the  adversaries  may  behave  arbitrarily,  i.c..  lie  about  their  inputs,  abort,  or 
not  follow  the  instructions  of  the  protocol.  Given  a  protocol  that  is  secure  in  the  semi-honest  model, 
the  protocol  can  be  transformed  into  a  protocol  secure  in  the  malicious  model  [18, 19].  However,  the 
resulting  protocols  are  very  inefficient.  Lindell  and  Pinkas  |26]  present  a  more  efficient  protocol  that  is 
based  on  the  informal  cut-and-choosc  technique  for  the  two-party  case  that  is  secure  in  the  malicious 
model.  However,  their  protocol  is  also  too  slow  for  our  purposes.  Protocols  that  are  secure  in  the  semi- 
honest  model  are  efficient  but  not  secure  in  our  context.  On  tire  other  hand,  protocols  that  are  secure  in 
the  malicious  model  are  too  inefficient  to  be  useful  in  our  context. 

The  adversary  model  we  use  in  Lhis  paper  is  inspired  by  the  covert  model  of  Aumann  and  Lindell  [  5  ]. 
In  the  covert  model,  any  attempt  to  cheat  hy  the  malicious  protocol  participant  A  is  delected  by  the 
honest  parties  with  probability  at  least  <=.  In  our  model,  wc  demonstrate  that  if  a  malicious  SSH  client 
cheats,  then,  with  high  probability,  the  SSH  server  either  detects  the  cheating,  or  computes  exactly  the 
same  result  it  would  have  computed  if  the  client  had  not  cheated. 

4.1  Building  Blocks 

4.1.1  Oblivious  Transfer 

In  our  implementation,  we  use  the  oblivious  transfer  (OT)  protocol  by  Naor-Pinkas  [  30 ].  This  protocol 
provides  information-theoretic  security  for  the  chooser  (SSH  server  in  our  implementation)  and  compu¬ 
tational  security,  based  on  Lhe  Diffie-Hellman  assumption,  for  the  sender  (SSH  client).  This  OT  proLocol 
is  a  good  choice  for  the  SSH  environment  due  to  its  efficiency.  As  an  alternative,  wc  could  have  imple¬ 
mented  our  system  using  a  fully  simulatablc  oblivious  transfer  protocol  such  as.  for  example,  the  new 
Diffie-Hellman-based  OT  protocol  by  Hazay  and  Lindell  [21 1  whose  computational  complexity  is  sim¬ 
ilar  to  the  Naor-Pinkas  protocol.  We  leave  an  implementation  of  this  OT  protocol  as  part  of  our  system 
Lo  future  work.  The  steps  of  the  Naor-Pinkas  OT  protocol  arc: 

1.  Let  q  be  a  prime  number  and  let  y  be  a  generator  for  the  field  7Lq.  Elements  q  and  g  arc  known 
to  both  parties.  The  protocol  uses  a  function  H  which  is  assumed  to  be  a  random  oracle.  Also, 
let  s  e  {0,1}  be  the  chooser's  secret  choice,  and  let  Mq  and  M\  be  the  sender's  messages.  At 
the  end  of  the  protocol,  the  chooser  should  only  learn  Ma  and  no  parly  should  learn  any  other 
information. 

2.  The  sender  picks  a  random  element  C  £  Zf,  and  sends  C  to  the  chooser. 

3.  The  chooser  picks  a  random  number  1  <  k  <  q  and  computes  two  values:  PA'o  and  PK\ .  where 
P h'j  =  yk  and  PK\-S  =  -jp?.  The  chooser  sends  P Kq  to  the  sender. 

4.  The  sender  encrypts  A/q  and  M\.  Specifically,  the  sender  chooses  random  values  ?'o  and  jq.  It 
encrypts  Mj,  (where  h  £  {0,1})  as  Ef,  =  (y"\  H{PK'bh)  0  Mb)  and  sends  Eq  and  E\  to  the 
chooser. 

5.  The  chooser  can  compute  H((gr'P)  =  H(PK '•“)  and  hence  decrypt  EA  and  compute  A-/,. 
Intuitively,  the  chooser  cannot  decrypt  E[  s  unless  the  chooser  can  find  k'  such  that  gk'  =  jpr. 
Therefore,  the  security  of  Naor-Pinkas  OT  depends  on  the  computational  Diffie-Hellman  assump¬ 
tion. 

Note  that  the  Noar-Pinkas  OT  protocol  is  not  resilient  to  a  man-in-thc-middlc  (MITM)  attack,  i.c.,  an 
attacker  can  change  the  second  component  of  E/,  so  that  the  chooser  receives  a  message  M(t  of  attackers 
choosing.  To  counter  this  attack  wc  change  Fy,  lo  (<fr'\  H(PKb‘)  0  (M&|| H{M(,))) 
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4.1.2  Secure-Function  Evaluation  (SFE) 

Consider  any  Boolean  circuit  C,  and  two  parties  (Alice  and  Bob),  who  wish  to  evaluate  C  on  their 
respective  inputs  x  and  y.  In  Yao's  “garbled  circuits”  method  [39].  Alice  transforms  the  circuit  in  such 
a  way  that  Bob  can  evaluate  it  obliviously,  i.e.,  without  learning  Alice's  inpuLs  or  the  values  on  any 
internal  circuit  wire  except  the  output  wires.  The  various  steps  arc  as  follows: 

1 .  For  each  wire  i  of  the  circuit  Alice  generates  two  random  keys  o  and  Ay.i  corresponding  to  0  and 

1.  For  all  wires  in  the  circuit  except  the  input  wires,  the  truth  table  for  the  corresponding  Boolean 
gate  is  encrypted.  If  g(x,y)  is  a  gate  with  input  wires  j  and  L  and  output  wire  i,  then  the  truth 
table  value  for  g(x.  y)  is  encoded  as  ,  ( E ^  Here.  k]  x  is  the  encryption  key  for 

value  x  of  wire  j,  and  similarly  for  kt  j.  kt  y^.  t  is  the  encryption  key  for  the  output  wire  of  <j  with 
value  i)(x,  y).  The  four  encrypted  values  representing  c? { 0 , 0).  c/{0, 1),  g ( 1 ,  ()).  and  c;(l.  1)  fully 
specify  the  gate  y.  Alice  sends  the  garbled  circuit  to  Bob.  Computation  of  the  garbled  circuit  docs 
not  depend  on  input  values  and  can  be  performed  in  advance.  However,  the  same  garbled  circuit 
must  not  be  used  more  than  once,  or  Alice's  privacy  may  be  violated. 

2.  Alice  sends  the  keys  corresponding  to  her  own  input  wires  to  Bob.  Bob  obtains  the  keys  corre¬ 
sponding  to  his  input  wires  from  Alice  using  the  oblivious- transfer  OT%  protocol.  For  each  of 
Bob’s  input  wires.  Bob  acts  as  the  chooser  using  his  corresponding  input  bit  to  die  function  as  die 
choice  into  OTj.  and  Alice  acts  as  the  sender  with  the  two  wire  keys  for  that  wire  as  her  inputs 
into  OT^  . 

3.  Bob  evaluates  the  circuit.  Because  of  the  way  that  the  garbled  circuit  is  constructed.  Bob.  having 
one  wire  key  for  each  gate  input,  can  decrypt  exactly  one  row  of  the  garbled  truth  tabic  and 
obtain  the  key  encoding  the  value  of  the  output  wire.  Yao's  protocol  maintains  the  invariant  that 
for  every  circuit  wire.  Bob  learns  exactly  one  wire  key.  Because  wire  keys  are  random  and  die 
mapping  from  wire  keys  to  values  is  not  known  to  Bob  (except  for  the  wire  keys  corresponding  to 
his  own  inputs),  this  does  not  leak  any  information  abouL  actual  wire  values. 

After  these  steps  the  circuit  has  been  evaluated  obliviously  by  Bob.  The  final  step  is  for  Bob  to  send  to 
Alice  her  output  wire  keys,  from  which  she  will  learn  Alice’s  designated  outputs.  A  complete  description 
of  this  protocol  along  with  a  formal  proof  of  correctness  appears  in  [27], 

A  protocol  for  secure-function  evaluation  based  on  Yao's  protocol  that  is  secure  in  the  covert  model 
is  presented  by  Auinann  and  Lindell  [5],  Our  protocol  is  similar  to  the  protocol  presented  by  Aumann 
and  Lindell,  but  uses  the  context  in  which  the  protocol  is  used. 

Our  protocol  relies  on  the  fact  that  in  the  context  of  password-based  authentication,  there  is  no 
difference  from  the  client’s  viewpoint  between  failure  due  to  submitting  a  wrong  password  and  failure 
due  to  a  malformed  circuit  This  enables  us  to  achieve  security  against  malicious  clients  at  a  much  lower 
cost  than  the  Aumann-Lindell  protocol,  where  the  client  may  learn  partial  information  about  the  server's 
input  by  cleverly  creating  malformed  circuits  and  seeing  which  of  them  were  detected.  Moreover,  by 
bundling  keys  in  the  oblivious-transfer  step,  any  tampering  by  the  man-in-the-middle  is  detected  with 
high  probability. 

4.2  Protocol  1:  Strawnian  Protocol 

Recall  that  the  client  iC)  has  the  password  x  =  P  and  the  server  (S)  has  the  hash  of  the  password 
y  =  H(P).  The  protocol  works  as  follows: 

•  Client  hashes  the  password  and  obtains  x1  —  H(x). 

•  Client  and  server  use  protocol  that  is  secure  in  the  covert  model  from  [5,  Section  6.2]  for  the 
function  /(a/,  y)  —  6(x\  y). 

•  After  the  protocol  die  client  and  server  know  whether  their  inputs  are  the  same. 
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The  protocol  given  in  [5]  has  several  parameters.  Note  that  the  Naor-Pinkas  OT  protocol  provides 
unconditional  security  for  the  server.  Therefore,  we  have  the  client  send  the  garbled  circuiLs  to  the  server, 
so  Lhe  server  acLs  as  chooser  in  the  underlying  OT -protocol.  Moreover,  if  each  hit  of  the  server's  input 
is  split  into  m  bits  and  the  cut-and-choosc  is  performed  over  I  circuits,  then  the  protocol  is  r -deterrent 
where  e  =  (1—  j)(l  —  2_,"+1). 

The  straw  man  protocol  has  a  vulnerability  which  defeats  the  entire  purpose  of  storing  passwords 
on  the  server  in  the  hashed  form.  To  successfully  authenticate  as  a  client  in  this  protocol,  it  is  sufficient 
Lo  know  only  Lhe  hash  of  Lhe  password  rather  than  Lhe  password  itself.  First,  this  means  that  if  Lhe 
server  is  compromised,  then  the  attacker  can  impersonate  any  client  whose  password  was  stored  on  the 
compromised  server,  even  if  these  passwords  were  stored  in  a  hashed  form.  Second,  if  the  server  is 
malicious,  then  it  can  impersonate  any  client  who  successfully  authenticates  to  it.  Nevertheless,  the 
straw  man  protocol  may  be  useful  in  certain  environments  with  relaxed  security  requirements. 

4.3  Protocol  2:  Main  Protocol 

We  now  present  our  main  protocol.  Recall  that  in  the  SSH  context,  there  are  two  parties  in  the  protocol: 
parly  1  (client)  has  input  x  and  parly  2  (server)  has  input  y.  They  want  Lo  jointly  compute  the  functional¬ 
ity  (a\  y)  — >  ((5(ff  (a:)  =  y),S(H(x)  =  y))  where  5rr^=y  is  equal  to  1  if  H(x)  —  y :  otherwise  it  is  0. 
If  client  gets  output  of  1,  it  means  that  client  was  authenticated  by  the  server.  The  reader  should  interpret 
x  as  the  password  and  y  as  the  hash  of  the  password  (in  other  words,  the  client  should  only  be  able 
to  successfully  authenticate  if  he  knows  the  password  whose  hash  matches  what  the  server  has).  The 
key  idea  is  that  the  hash  function  H  is  included  in  the  functionality,  which  makes  our  protocol  resilient 
against  malicious  servers  impersonating  clients  (see  Section  4.2):  knowledge  of  the  password  hash  is 
not  sufficient  to  authenticate  as  the  client. 

The  following  protocol  description  assumes  that  the  reader  is  familiar  with  the  basics  of  secure  - 
function  evaluation  (such  as  garbled  circuit  construction  and  oblivious  transfer). 

•  (Step  1)  Client  creates  l  garbled  circuits  C] .  •  •  •  .Q  for  6(H(x).y).  Let  server's  input  y  — 
y  ]  •  •  •  ym  be  m  bits.  The  wire  keys  corresponding  to  the  j -bit  of  server’s  input  for  the  i-th  garbled 
circuit  C,  is  denoted  by  and  i-J  ..  Client  sends  circuiLs  Cj ,  CV  •  •  •  ,  C)  to  the  server. 

•  (Step  2)  Client  and  server  execute  the  OXj2  protocol  m  times.  In  the  j-th  instance  of  OTf  the 
client  acts  as  a  sender  with  inputs  k(l  ■H^1  ;||  -  •  ■  ||A:[h  and  - 1| }-||  -  ■  •  \\kj-  and  the  server  acts 
the  chooser  with  input  y.j  (the  j-th  bit  of  the  input).  Notice  that  concatenating  the  keys  prevents 
the  server  from  learning  keys  corresponding  to  different  bits,  c.g.,  server  cannot  learn  keys  ■ 
and  kl:j. 

•  (Step  3)  Server  chooses  a  random  set  S  C  jl ,  2,  •  •  -  ,  /}  and  sends  S  to  the  client. 

•  (Step  4)  Client  reveals  wire  keys  for  circuits  Cj  such  that  j  £  S  to  the  server  (we  call  this  sLep 
opening  the  circuits  Cj  such  that  j  fc  S).  Client  also  provides  wire  keys  for  its  input,  .r  for  circuits 

•  (Step  5)  If  the  circuits  Cj  ( j  t  ,S)  are  not  well-formed  (the  circuits  do  not  compute  5(H(x),  y)  or 
the  keys  are  not  consistent  with  what  was  sent  in  step  2),  the  server  sends  0  lo  the  client.  Server 
computes  C,  (j  ^  S)  and  obtains  answers  »,  (j  ^  S).  Server  sends  A?ys  °j lo  lhe  client. 

It  is  clear  that  if  both  client  and  server  are  honesL.  then  Lhe  client  will  successfully  authenticate  lo  Lhe 
server  if  and  only  if  it  has  a  password  x  whose  hash  H{x)  is  equal  lo  the  input  of  the  server  y.  Some  of 
the  important  features  of  our  protocol  arc: 

•  The  server  learns  the  wire  keys  corresponding  to  one  input.  In  other  words,  it  is  not  possible  for 
the  server  lo  evaluate  circuit  C,  on  input  x\  and  circuit  Cj  (j  i)  on  a  different  input  x-i-  This 
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is  the  rationale  behind  concatenating  the  wire  keys  in  step  2  of  the  protocol.  It  ensures  that  a 
malicious  server  cannot  enter  more  than  one  password  hash  into  the  computation  in  an  aLlcmpl  to 
learn  Lhe  client's  input. 

•  Assume  that  out  of  the  l  garbled  circuits  C\ ,  •  •  •  .  C'i  the  circuits  with  index  j  e  D  (where  D  C 
{1. 2,  ■  •  ■  .  /})  arc  not  valid.  The  only  way  the  client’s  cheating  is  not  delected  is  if  B  is  a  subset 
of  -i S  (the  complement  of  5),  i.e.,  all  invalid  circuits  arc  in  the  unopened  set. 

•  The  server's  response  Lo  the  client  is  computed  as  the  A  of  the  outputs  of  all  unopened  circuits 
(Step  5).  This  exploits  the  essential  feature  of  password  authentication,  namely,  that  the  client 
receives  a  single  bit  from  the  server. 

As  long  as  the  password  submitted  by  the  client  is  wrong  and  at  least  one  of  the  unopened  circuits 
is  correct  (i.e.,  it  correctly  computes  the  hash  of  the  client's  input  and  compares  it  for  equality 
with  the  server's  input),  the  server's  answer  will  be  0:  "failed  authentication  attempt.”  Therefore,  a 
malicious  client  docs  not  learn  anything  by  submitting  invalid  circuits,  unless  nil  unopened  circuits 
are  invalid.  The  outputs  of  the  invalid  circuits  are  effectively  hidden  from  the  client  by  the  output 
of  a  single  correct  circuit.  By  contrast,  the  generic  construction  for  the  malicious  model  [26] 
requires  that  the  majority  of  unopened  circuits  be  correct  to  prevent  information  leakages. 

If  the  client's  input  is  the  correct  password  (i.e.,  its  hash  is  equal  to  the  server's  input),  then  the 
client  can  compute  the  server's  input  on  his  own.  Therefore,  Lhe  client  cannot  possibly  learn 
anything  from  the  protocol  execution,  except  a  single  bit  confirming  that  his  input  is  correct. 

Observe  that  a  malicious  client  who  docs  not  know  the  password  will  successfully  authenticate 
(i.e..  receive  bit  1  rather  than  0  as  his  output  of  the  protocol)  if  and  only  if  all  unopened  circuits 
are  invalid,  i.e.,  the  set  of  invalid  circuits  D  is  exactly  -iS.  Because  S  is  chosen  randomly,  the 
probability  of  this  event  is  2  1 . 

•  There  is  no  consistency  cheek  on  the  client's  inputs.  A  malicious  client  may  input  different  pass¬ 
words  into  different  circuits.  Recall  that  with  high  probability,  the  unopened  set  contains  at  least 
one  correct  circuit.  Clearly,  submitting  a  wrong  password  to  a  correct  circuit  will  result  in  au¬ 
thentication  failure.  Therefore,  the  only  situation  in  which  the  client  will  authenticate  is  if  he 
consistently  submits  the  correct  password  to  every  correctly  formed,  unopened  circuit.  Wc  argue 
that  this  is  equivalent  lo  knowing  the  correct  password  in  the  first  place,  i.e..  submitting  inconsis¬ 
tent  inputs  does  not  offer  any  benefits  to  a  malicious  client. 

If  the  client  submits  inconsistent  inputs  and  authentication  fails,  the  client  docs  not  learn  which  of 
the  inputs  were  correct  and  which  were  incorrect.  Therefore,  the  client  is  still  limited  to  a  single 
password  per  authentication  aLtempl. 

We  formally  argue  the  protocol  preserves  the  privacy  of  both  parties'  inputs. 

Client’s  privacy:  Assume  that  the  client  is  honest  and  the  server  is  controlled  by  an  adversary  A.  ,4's 
view  consists  of  the  l  garbled  circuits  C i,  ■  ■  -  ,  C),  messages  received  during  the  m  OT'{  protocols,  all 
keys  for  Cj  ( j  €  S),  where  S  C  {1. 2.  •  ■  -  ,  1}  is  chosen  by  A),  and  keys  corresponding  to  the  client's 
input  x  for  circuits  Cj  (j  $  S).  Assume  that  views  corresponding  to  the  m  OTf  protocols  only  reveal 
the  secrets  corresponding  to  the  input  y  of  .4  (let  the  .4  input  be  y  =  yy  •  ■  y,n  then  the  server  learns 
1  5-  j  S  I  and  1  <  k  <  m).  This  follows  from  the  privacy  of  the  underlying  oblivious-transfer  protocol. 
For  example,  if  one  uses  the  Naor-Pinkas  oblivious-transfer  protocol,  then  we  have  die  information- 
theoretic  security  for  the  server.  Assuming  that  the  encryption  scheme  used  to  construct  the  garbled 
circuits  is  semantically  secure,  revealing  the  wire  keys  for  circuits  Cj  (j  fc  S)  docs  not  reveal  any 
information  about  the  client's  input.  Consider  the  circuits  Ct  ( j  £  S).  Server  can  evaluate  this  circuit 
on  (a:,  y )  but  learns  nothing  else.  Consider  an  ensemble  of  garbled  circuits  Cj-  ( j  £  S ),  where  Cj 
computes  the  constant  function  5(H(x).  y)J  If  the  encryption-scheme  is  semantically  secure,  then  .4 

lWe  assume  C't  is  constructed  from  the  same  encryption  scheme  that  was  used  to  construct  Cl.  •••  ,  Cj. 
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cannot  distinguish  between  circuits  Ct  and  C-  (for  j  S).  Essentially  only  learns  whether  hash  of 
client’s  password  is  equal  lo  its  input  and  nothing  else. 

Server’s  privacy:  First  we  give  an  informal  sketch  for  server’s  privacy.  Assume  that  server's  input 
is  y.  Wc  now  show  that  in  order  for  a  malicious  client  who  docs  not  know  a  password  x'  such  that 
H{x')  =  y ,  his  probability  of  successful  authentication  to  the  server  (or  impersonation)  is  no  better  than 
2  l.  In  other  words,  the  probability  that  a  malicious  client  who  does  not  know  the  pre-image  of  the 
server’s  input  successfully  impersonating  a  honest  client  is  bounded  by  2~l.  The  use  of  even  modestly 
large  value  of  parameter  l  will  make  it  more  likely  that  the  adversary  can  simply  guess  the  password 
than  to  break  the  protocol.  Wc  consider  this  sufficient,  but  if  desired,  extremely  large  values  of  l  can  be 
used  to  make  the  probability  of  breaking  the  protocol  negligible,  with  a  performance  penalty  linear  in 
the  value  of  1. 

In  particular,  we  show  that  that  the  protocol  is  secure  unless  the  client  perfectly  guesses  the  subset  ,S 
of  the  l  circuits  that  the  server  will  choose  and  prepares  the  encrypted  circuits  accordingly. 

It  is  sufficient  to  assume  that  the  malicious  client  does  not  know  the  correct  password.  If  the  client 
knows  the  password  then  there  is  no  information  to  be  learned  from  the  server  that  he  docs  not  already 
possess.  There  is  no  useful  purpose  lo  cheating  the  protocol,  since  he  would  achieve  the  desired  outcome 
by  executing  it  faithfully.  In  this  case  we  simply  do  not  care  if  the  client  cheats  because  he  only  hurts 
himself.  There  are  three  possible  cases: 

•  (Case  1)  The  client  includes  an  invalid  circuit  in  S.  The  server  will  detect  this  in  and  reject  the 
authentication  in  step  5. 

•  (Case  2)  Every  circuit  in  S  is  correct  and  ->S  (which  denotes  the  complement  of  5)  includes  at 
least  one  valid  circuit.  When  the  server  evaluates  this  circuit,  it  will  evaluate  to  0  and  the  server 
will  reject  the  authentication.  Recall  that  if  a  circuit  is  valid,  it  will  evaluate  to  0  on  the  inputs  of 
the  client  and  server  because  the  client  docs  not  know  the  pre-image  of  the  server's  input. 

•  (Case  3)  Every  circuit  in  S  is  correct  and  every  circuit  in  ->S  is  incorrect.  We  make  no  claims  of 
correctness  about  this  case.  In  particular,  the  client  could  have  made  every  circuit  in  -i S  evaluate 
to  1.  in  which  case  the  server  would  accept  the  impersonating  client  as  authentic. 

Since  the  server  chooses  the  subset  S  uniformly  from  the  space  of  proper  subsets,  tine  probability  of 
case  3  happening  is  2  l. 

Assume  that  the  server  is  honest  and  the  client  is  malicious.  The  client  is  controlled  by  an  adversary 
A.  Wc  construct  a  simulator  Sim  which  works  in  the  ideal  model.  Recall  that  in  the  ideal  model  the  joint 
computation  is  performed  using  a  trusted  party  (TP).  Sim  acts  as  the  server  for  A. 

•  .4  sends  l  copies  of  the  garbled  circuits  C\ .  •  ■  •  ,  Q  to  Sim. 

•  Sim  acts  as  TP  for  A  for  the  in  oblivious  transfer  protocols.  Sim  knows  the  inputs  of  .4  (which  in 
the  case  of  the  honest  client's  are  the  wire  keys  corresponding  to  the  server's  inputs). 

•  Sim  chooses  a  random  set  Sj  C  { 1 . 2.  •  •  •  ,  /}  and  sends  it  to  A. 

•  A  sends  all  the  wire  keys  corresponding  to  circuits  Cj  ( j  G  Si). 

•  Sim  rewinds  A,  and  sends  the  complement  of  S)  to  A.  A  sends  all  the  wire  keys  corresponding 
to  circuits  Cj  (j  ^  Si).  Note  that  after  this  step  Sim  knows  the  wire  keys  corresponding  to  all  the 
garbled  circuits  <?,.•••  .a. 

•  Sim  rewinds  A,  picks  a  random  set  S  <Z  {1, 2,  •  •  -  ,  /},  and  sends  it  to  A.  A  sends  wire  keys 
corresponding  to  tine  circuits  Cj  (j  G  S). 

•  A  provides  the  wire  keys  for  all  circuits  Cj  {j  (f  S).  Note  that  since  Sim  knows  the  wire  keys  for 
all  the  garbled  circuits,  it  can  now  construct  ,4's  inputs  x-j  to  circuits  Cj  (J  $  S).  If  the  inputs  are 


inconsistent  (i.e.,  not  all  equal  to  the  same  value),  Sim  sends  0  to  A.  If  all  inputs  ( j  qL  S)  are 
equal  to  x,  then  Sim  sends  x  to  the  TP.  I  FTP  returns  1  (which  means  that  A  knew  Lhc  pre-image 
of  server’s  input),  then  Sim  sends  1  to  A  (which  essentially  means  that  the  malicious  client  was 
authenticated).  If  TP  returns  0,  we  proceed  to  the  next  step. 

•  Sim  checks  the  validity  of  all  the  circuits  C,  (j  S).  If  any  of  these  circuits  is  found  to  be  invalid, 
then  Sim  sends  0  to  A.  Otherwise  Sim  sends  1  to  A. 

Assume  that  A  docs  not  know  the  pre-image  corresponding  to  the  server’s  input.  Suppose  the  inputs 
x,j  (j  £  S )  arc  not  equal.  In  this  case.  A  receives  0  from  Sim.  We  argue  that  in  the  real  model  A  would 
receive  1  only  if  it  knows  the  pre-image  corresponding  to  the  server’s  input  (a  contradiction).  Tire  only 
way  *4  receives  a  1  if  all  of  his  inputs  into  correctly  formed,  unopened  circuits  are  pre-images  of  the 
server's  input,  which  means  A  knew  the  pre-image  of  the  server's  input  to  begin  with,  contradicting 
our  assumption.  Hence  Lhe  views  of  A  in  the  ideal  and  real  world  are  the  same  when  the  inputs  xj 
{j  £  S)  arc  not  equal,  unless  all  opened  circuits  arc  valid  and  all  of  tire  unopened  circuits  arc  invalid 
(the  probability  of  this  event  is  2~l). 

Now  assume  that  inputs  x,  (j  ^  S)  are  all  equal  to  x.  Let  Ej  be  the  event  that  a  honest  server  denies 
authentication  to  A  in  the  real  model,  and  Ej  be  the  event  that  Sim  denies  authentication  to  A  in  the 
ideal  model.  Recall  that  denying  authentication  is  tantamount  to  A  receiving  0.  It  is  easy  to  see  that  if 
Ei  and  E2  arc  true,  then  the  view  of  A  in  the  real  model  is  indistinguishable  from  the  view  of  A  in  the 
ideal  model.  The  probability  of  event  Ei  A  E2  not  happening  is  bounded  by  2~,_1. 

We  conclude  that  if  the  client  does  not  know  the  pre-image  of  the  server's  input,  then  the  probability 
that  the  view  of  A  in  the  real  model  is  indistinguishable  from  the  view  of  A  in  the  ideal  model  with 
probability  allcast  1  —  2-,+1.  In  other  words,  conditioned  on  the  event  that  the  malicious  client  docs  not 
know  the  pre-image  of  the  server's  input,  the  probability  of  the  simulator  failing  is  bounded  by  2  1  1 . 
This  model  is  very  similar  to  the  “failed  simulation"  model  given  by  Aumann  and  Lindcll  [5,  Section 
3.21. 

4.3.1  Security  against  man-in-the-middle  attacks 

Observing  an  instance  of  our  protocol  yields  no  information  that  will  be  useful  in  subsequent  instances  of 
the  protocol  {e.g.,  it  does  not  reveal  the  parties'  inputs).  Consider  a  man-in-the-middle  (MITM)  attacker 
who  captures  all  messages  exchanged  between  the  client  and  the  server.  The  attacker  will  not  be  able  to 
replay  a  message  from  an  old  session  because  various  steps  in  the  protocol  use  fresh,  randomly  generated 
values.  For  example,  in  each  instance  of  the  Naor-Pinkas  oblivious-transfer  protocol,  the  chooser  (in 
our  ease  the  SSH  server)  generates  a  random  value  k  and  sends  gk  or  ^  (where  g  is  generator  of  the 
underlying  group  and  C  is  an  element  in  the  group).  Therefore,  observing  the  client's  and  server's 
inputs  into  our  protocol  does  not  reveal  the  password  hash,  nor  any  other  information  that  can  be  used 
in  subsequent  sessions. 

Now  consider  a  MITM  atLacker  who  attempts  to  learn  information  about  the  inputs  by  tampering 
with  an  instance  of  the  protocol  between  a  valid  client  (who  knows  the  correct  password)  and  a  valid 
server  (who  knows  the  correct  password  hash).  Specifically.  MITM  attempts  to  make  the  authentication 
attempt  fail  conditionally ,  depending  on  a  bit  of  the  server's  input.  If  successful,  this  would  leak  1  bit  of 
the  password  hash,  violating  security  of  the  protocol.  Wc  show  that  any  tampering  causes  the  server  to 
reject  authentication  with  near  certainly,  and  is  thus  equivalent  to  simply  attempting  to  authenticate  with 
a  wrong  password. 

Note  that  because  the  client's  inputs  into  the  oblivious  transfer  protocol  arc  information-thcorctically 
secure,  only  the  server's  inputs  {i.e.,  password  hash)  can  be  attacked  in  this  way.  Without  loss  of  gener¬ 
ality.  suppose  MITM  is  attempting  to  learn  the  first  bit  of  the  server's  input. 

The  intuition  why  this  attack  doesn't  work  is  as  follows.  First,  any  tampering  must  modify  both  the 
circuit(s)  and  the  corresponding  wire  keys,  or  it  will  he  detected  when  the  server  verifies  opened  circuits. 
Second,  in  the  oblivious  transfers  used  by  our  protocol  for  each  bit  of  the  server  input,  the  wire  keys 
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representing  this  bit  in  nil  circuits  are  “bundled"’  together  and  transferred  concatenated  as  a  single  value , 
with  an  integrity  check.  As  we  show,  M1TM  cannot  tamper  with  or  replace  the  keys  used  in  a  single 
circuit;  any  tampering  will  affect  all  circuits  received  by  the  server,  ensuring  detection  wiLh  a  very  high 
probability. 

Attack  1.  M1TM  tries  to  replace  circuit  Cj  with  circuit  C\  where  C\  simply  returns  the  first  bit  of  the 
password  hash.  This  attack  does  not  work  because  MITM  does  not  know  the  wire  keys  used  to  construct 
circuit  Ci .  There  are  two  cases: 

1.  Ci  6  S,  the  set  of  opened  circuits.  In  this  case,  the  server  will  see  that  C^  is  not  a  valid  circuit 
and  will  reject  the  authentication  attempt. 

2.  Cj  ^  S.  In  this  case,  when  the  server  attempts  to  evaluate  circuit  C[.  the  Yao  circuit  evaluation 
will  fail  because  the  wire  keys  arc  invalid.  Again,  the  server  will  reject  the  authentication  attempt. 

Attack  2.  MITM  tries  to  tamper  with  OT2  as  a  black  box.  First,  MITM  impersonates  the  server  in  0T2 
with  the  client  for  the  first  bit  of  inputs  into  each  circuit  C,,  1  <  i  <  l.  This  enables  MITM  to  retrieve 
the  wire  keys  A'(]  ()  ||  •  •  -  ||  k[,  ()  encoding  this  bit  in  all  circuits. 

MITM  lets  the  OTs  between  the  client  and  the  server  proceed  normally  for  the  remaining  bits 
k i,  •  •  •  kb  i-  For  the  first  bit  A'o,  MITM  impersonates  as  the  sender  and  lets  the  server  choose  between 
A’qo  ||  -  •  •  ||  kj}  (l  and  k(\  j  ||  •  •  •  ||  k{t  j,  where  kx\  i  ||  ■  ■  •  ||  kla  x  have  been  generated  by  MITM. 

This  attack  will  fail  when  the  server  verifies  the  circuits  in  S.  Because  k^\  ||  •  ■  •  ||  A-q  j  are  not  valid 
wire  keys,  the  server  will  delect  that  the  circuits  in  S  arc  malformed  and  will  reject  the  authentication 
attempt. 

Attack  3.  MITM  tries  to  tamper  with  OTj  by  modifying  specific  messages  in  the  Naor-Pinkas  OT 
protocol. 

1.  MITM  replaces  the  entire  message  Ay1,  0  ||  •  •  •  ||  k(l().  This  will  be  detected  for  the  same  reason  as 
in  Attack  2. 

2.  MITM  replaces  some,  but  not  all,  of  the  wire  keys  K'  C  { k]{]  q.  •  •  ■  .  A:()0}.  Assume  MITM 
attempts  to  tamper  with  the  0-wire  key  only  for  the  first  circuit,  A'(j{).  while  leaving  the  other 
wire  keys  intact.  Observe,  however,  that  the  concatenation  ||  •  •  ■  ||  A'q  q  is  transferred  as  a 
single  encrypted  value,  and  in  the  Naor-Pinkas  oblivious  transfer,  MITM  cannot  tamper  with  the 
transferred  value  (see  Section  4.1.1).  Therefore,  MITM  cannot  selectively  replace  only  some  of 
die  wire  keys:  the  integrity  hash  will  not  match  and  tampering  will  be  detected  by  the  server,  who 
will  always  reject  the  authentication  attempt  even  if  the  client’s  original  password  was  correct 
(thus  leaking  no  information  to  MITM). 

4.4  Protocol  3:  Adding  key  establishment 

In  the  context  of  SSH,  client  and  server  need  to  compare  their  inputs  and  also  establish  a  session  key 
if  the  comparison  between  their  inputs  is  successful.  The  protocol  is  an  easy  extension  of  our  main 
protocol. 

•  Client  picks  a  random  key  K.  Client  and  server  execute  a  variation  of  the  main  protocol  that 
computes  the  following  functionality: 

((./;,  K),  y)  -»  (if  (H  {:>:)=  y)  then  {A",  K)  else  (J — L)) 

A  malicious  client  may  pick  key  K  that  is  not  truly  random.  However,  if  the  client  is  malicious,  it 
can  uncncrypt  and  forward  its  messages  to  an  adversary  regardless  of  this.  On  die  other  hand,  suppose 
the  server  is  malicious.  If  the  malicious  server  knows  the  hash  H(x)  of  the  client’s  input  ;r,  then  it  knows 
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If  no 

Figure  4:  The  SFE  user  authentication  protocol  2. 

the  session  key  K.  In  this  case,  the  server  can  again  forward  the  unencrypted  messages  to  the  adversary. 
If  the  server  does  not  know  H(x),  then  it  cannot  obtain  the  session  key.  In  other  words,  adding  the  extra 
functionality  of  distributing  session  keys  docs  not  affect  the  security  of  protocol  2  (which  only  compares 
H{  x)  and  y). 


5  Evaluation 

We  modified  an  existing  open-source  SSH  client  and  server  to  use  each  of  the  protocols  described  in 
Section  4.  and  took  several  performance  measurements  to  evaluate  the  feasibility  of  our  approach.  We 
implemented  and  tested  the  salted  MD5  and  SHA-512  hashing  methods  commonly  used  in  Linux  distri¬ 
butions.  Our  findings  can  be  summarized  as  follows: 

•  Protocols  that  arc  secure  in  the  semi-honest  model,  which  is  only  secure  against  passive  adver¬ 
saries,  can  be  executed  very  quickly. 

•  Making  the  protocols  secure  against  active  adversaries  increases  authentication  time  substantially, 
depending  on  the  size  of  the  authentication  circuit  and  the  security  parameter.  For  example,  cal¬ 
culating  an  MD5  hash  using  90  circuits  increases  the  authentication  time  from  around  2  seconds 
to  1 2  seconds.  Although  this  may  seem  tedious  for  some  users,  wc  achieve  a  high  level  of  security 
with  only  modest  delays  on  inexpensive  modern  hardware.  We  conclude  that  the  technique  can 
achieve  a  favorable  balance  helween  efficient  practicality  and  high  security  on  systems  using  MD5 
hashing. 

•  Due  the  simplicity  of  private  equality  testing.  Protocol  1  (the  Straw  man  protocol)  can  run  ex¬ 
tremely  quickly  even  under  the  covert  model  at  the  expense  of  resisting  impersonation  by  an 
adversary  who  has  gained  knowledge  of  a  user's  password  hash.  If  this  security  requirement  can 
be  relaxed  (for  example,  in  an  environment  where  passwords  are  stored  on  the  server  using  an 
identity  hash,  i.e.  in  plaintext)  protocol  1  could  be  a  useful  high-speed  authentication  protocol. 

5.1  Implementation 

Wc  implemented  the  protocols  by  modifying  the  Dropbcar  0.52  SSH  client  and  server  to  support  a  new 
authentication  protocol,  to  which  wc  assigned  the  name  “sfeauth”  in  the  SSH  authentication  protocol 
namespace.  The  scheme  used  for  incorporating  the  protocols  into  the  SSH  protocol  is  shown  in  Figure  4. 
Our  protocol  is  executed  through  the  encrypted  SSH  tunnel  using  a  reserved  message  which  we  dubbed 
SSH_MSGJJSERAUTH_SFEMSG.  If  at  any  time  Lhe  server  delects  a  cheating  attempt  by  the  client,  the 
server  denies  the  authentication  and  terminates  the  protocol. 
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The  MD5,  S HA-5 12.  and  private  equality  circuits  were  implemented  using  a  prototype  circuit  com¬ 
piler  wc  developed  first  described  in  [23],  which  also  contains  an  embeddable  implementation  of  the 
Yao  “garbled  circuit”  proLocol.  The  protocol  was  extended  using  the  techniques  due  to  Lindell  and 
Pinkas  [20]  to  add  resistance  to  malicious  parties  in  the  covert  model.  The  implementation  also  uses  the 
oblivious  transfer  protocol  due  to  Naor  and  Pinkas  [30].  All  of  the  Yao  protocol  and  authentication  code 
was  written  in  C++  and  integrated  with  the  Dropbear  SSH  client  and  server. 

5.1.1  Optimizations 

To  improve  performance  of  the  protocols,  wc  introduced  several  optimizations  to  our  implementation. 

1.  The  client  computes  the  garbled  circuits  used  in  the  authentication  protocol  in  advance  of  inter¬ 
acting  with  the  online  protocol.  By  precomputing  and  storing  garbled  circuits,  the  lime  spent  in 
this  CPU  intensive  step  is  removed  from  the  user’s  perceived  wait  lime  to  login  to  a  server. 

2.  Wc  implemented  an  optimization  described  by  Goyal  et  al  [20].  In  constructing  the  garbled  Yao 
circuits,  the  client  generates  a  set  of  seeds  for  a  cryptographically-secure  pseudorandom  number 
generator  (PRNG).  The  circuits  are  garbled  using  this  PRNG,  with  one  seed  per  circuit,  and  hashes 
of  the  circuits  arc  sent  in  place  of  the  whole  circuits.  After  the  server  has  chosen  which  circuits 
to  open,  the  seeds  for  the  non-chosen  circuits  are  revealed  to  the  server,  who  then  uses  the  PRNG 
to  reconstruct  the  garbled  circuit  and  verify  the  hash  values,  and  only  the  circuits  to  be  evaluated 
are  transferred  in  full.  This  saves  many  megabytes  of  wire  communication,  improving  the  overall 
protocol  performance. 

Tire  prototype  SSH  client  and  server,  as  well  as  further  documentation,  can  be  downloaded  from  our 
project  website.2 

5.2  Experiments 

Wc  conducted  several  usage  experiments  to  measure  the  performance  of  the  authentication,  and  deter¬ 
mine  its  feasibility  in  teal  settings.  Note  that  the  semi-honest  version  is  not  secure  for  real-world  usage 
where  the  possibility  of  active  malicious  adversaries  cannot  be  ruled  out,  but  the  experiment  is  useful 
to  establish  an  upper  bound  for  the  potential  performance  with  further  optimizations.  The  Lests  were 
performed  over  a  local  network  using  computers  with  eight  core  Intel  Xeon  processors  and  8GB  of 
RAM. 

The  performance  results  of  our  experiments  are  shown  in  Table  1.  The  first  row  corresponds  to  the 
semi-honest  version  of  the  protocol,  and  is  the  time  on  which  the  ratio  lo  semi-honest  column  for  other 
rows  is  based.  The  column  titled  probability  of  attack  success  refers  to  the  probability  of  a  malicious 
client  successfully  convincing  the  server  that  he  has  the  proper  credentials  to  authenticate.  This  calcu¬ 
lation  is  discussed  in  detail  in  section  4.3.  As  our  results  indicate,  the  time  required  to  complete  the 
protocol  increases  linearly  as  the  number  of  circuits  increases,  while  the  security  guarantee  increases 
exponentially  in  this  measure.  Note  that  for  less  than  an  order  of  magnitude  increase  over  semi-honest 
implementation,  sufficient  security  guarantees  for  many  practical  settings  can  be  attained,  especially 
when  using  the  MD5  protocol. 

The  experiments  on  120  and  150  circuits  for  SHA-512  could  not  be  completed.  This  is  due  to 
the  complexity  of  the  SHA-5 1 2  circuit,  which  has  close  to  1 24,000  gates,  compared  lo  under  1 8,000 
for  the  MD5  circuit.  Because  of  this,  our  test  machines  did  not  have  enough  RAM  to  hold  120  or  more 
encrypted  copies  of  the  circuit.  SHA-5 12  uses  80  rounds  of  its  compression  functions.  Asa  performance 
optimization,  it  would  he  possible  to  use  a  reduced-round  variant  of  the  circuit  into  which  the  client 
inputs  into  the  circuit  the  output  after  the  first  N  rounds,  and  the  circuit  computes  the  remaining  80  -  N 
rounds.  Such  an  optimization  would  reduce  the  margin  of  safety  built  into  the  SHA-512  hash  function 
against  pre-image  attacks,  if  the  password  database  were  compromised.  However,  such  an  optimization 
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#  Circuits 

MD5  Online 
Time  (sec) 

MD5  Ratio  to 

Semi-Honest 

SHA-512  Online 
Time  (sec) 

SHA-512  Ratio 

to  Semi-Honest 

Probability  of 
attack  success 

1  ( Semi-Honest ) 

0.52 

1.0 

3.02 

1.0 

100% 

30 

3.99 

7.7 

28.8 

9.5 

1.86  x  10  '% 

60 

7.86 

15.2 

62.6 

20.7 

1.73  x  10_1(i% 

90 

12.35 

23.8 

85.0 

28.1 

1.62  x  KJ-^% 

120 

16.46 

31.8 

N/A 

N/A 

1.50  x  111 

150 

20.57 

39.6 

N/A 

N/A 

1.40  x  10 

Table  1:  Wall-clock  performance  and  security  guarantees  for  the  optimized  protocol  in  both  semi-honest 
and  covert  settings.  All  times  are  given  in  seconds. 


would  reduce  the  size  of  the  circuit  by  a  factor  of  — ^~jA' ,  with  a  corresponding  increase  in  performance. 
Naturally,  this  optimization  would  need  to  be  weighed  carefully  against  the  cryptographic  consequences. 

Overall,  we  believe  these  results  indicate  that  our  technique  is  suitable  for  common  use  in  real 
applications,  especially  on  systems  based  on  a  simpler  hash  function  than  SH  A-5 12,  such  as  the  common 
MD5  standard. 

We  note  that  the  performance  is  sensitive  to  available  processing  power  due  to  the  many  crypto¬ 
graphic  primitives  employed.  For  example,  our  implementation  takes  advantage  of  parallelization  on 
multi-core  processors  to  encrypL  multiple  circuits  in  parallel  as  the  server  performs  its  circuit  verifica¬ 
tions.  Due  to  the  independence  of  the  verification  of  each  circuit,  parallel  scaling  can  be  achieved  that  is 
extremely  cflicicm  with  respect  to  available  processors,  potentially  allowing  high  security  authentication 
with  minimal  delays  to  clients  on  server  machines  with  enough  processing  power. 

Overall,  we  believe  that  the  results  we  have  achieved  so  far  demonstrate  the  potential  of  this  tech¬ 
nique  as  a  practical  and  secure  addition  to  the  body  of  research  in  secure  password  authentication. 

6  Conclusion 

In  this  paper,  wc  have  addressed  the  problem  of  providing  secure  mutual  authentication  for  the  SSH 
protocol,  maintaining  full  backwards  compatibility  with  existing  server/client  infrastructures  as  a  pri¬ 
mary  requirement.  Leveraging  the  unique  flexibility  of  SFE  to  compute  arbitrary  hash  functions  within 
the  protocol,  wc  constructed  a  protocol  that  provides  an  identical  set  of  security  guarantees  as  previous 
work  in  the  area,  while  remaining  suitable  as  a  “drop-in"  authentication  module  on  nearly  all  existing 
servers.  Furthermore,  we  demonstrated  that  a  conservative  set  of  optimizations  to  the  protocol  made 
it  practical  for  common  use  from  a  performance  standpoint.  In  the  future,  wc  intend  to  study  further 
optimizations  to  certain  parts  of  the  protocol  that  improve  performance  without  sacrificing  security,  as 
well  as  additional  applications  that  could  benefit  from  our  basic  technique. 
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